Monitoring

monitoring_bannerBecause your SAN is a critical asset in your IT environment you want it to run as smooth and trouble-free as possible.

Unlike a TCP/IP network a Fibre Channel SAN is :

  • sensitive to frame drop
  • redundant but not necessarily error – free
  • sensitive to link quality issues

To address these problems we built a Fibre Channel SAN monitoring framework. It’s main goals are

  • increase availability by preventing problems from occuring ( gradually decreasing levels )
  • shorten troubleshooting timespan in case of sudden component failure
  • provide a means for capacity planning

The to be expected problems are widespread :

  • SFP degradation over time
  • Link quality issues
  • ISL capacity limit reached
  • Host port capacity reached
  • Fabric instability
  • Slow draining devices & head of line blocking
  • Environmental problems
  • FRU failure
  • Buffer to Buffer credit starvation

How do we address these problems and keep them from occuring ? By monitoring :

  • Port speed/type/state
  • SFP RX and TX power
  • Port usage percentage
  • Port error counters
  • Port Buffer to Buffer credits
  • Power supply / FAN / Temperature sensors

Thresholds are set to warn you by SMS/mail, a dashboard provides the overview of the entire environment. You’re not alone when disaster strikes, we also get the alerts and can assist in resolving the problem at hand. Moreover, we prefer to keep an eye with you on your environment to identify potential issues and resolve them before turning into troublemakers.

As some screenshots say more than a thousand words here are some examples from a real life monitoring setup.

Below screenshots provides an overview of the overall FC switch status.

SWF2S1_Memory

SWF2S1_CPUPer FC switch an overview is shown of the currently active triggers :

Switch_TriggersThe events occured in the selected timeframe :

Events_logAnd some screenshots of an ISL port metrics :

SWF2S1_Port_Traffic_8SWF2S1_Port_Usage_8The minima ( as that’s what we are interested in ) of all VC’s on the ISL is monitored to make sure we don’t run out of BBC :SWF2S1_Port_BBC_8

SFP levels are closely monitored too :

SWF2S1_SFP_RX_TX_Power_8SWF2S1_SFP_Voltage_8SWF2S1_SFP_Temperature_8

And, needless to say, port errors are monitored even closer :

SWF2S1_Port_Errors_All_8And last but not least, SNMP traps can be received and, according to the severity level, forwarded by mail/SMS as wel.