Friday, August 21, 2020

Cisco SAN Analytics and Telemetry Streaming - Why Should I Use Them?


Are you sometimes overwhelmed by performance problems on your Storage Network?  Do you wish you had better data on how your network is performing?  If you answered yes to either of these questions, read on to find out about Cisco SAN Analytics and Telemetry Streaming.    


The Cisco SAN Analytics engine is available on Cisco 32Gbps and faster MDS 9700 series port port modules and the 32 Gbps standalone switches.   This engine is constantly sampling the traffic that is running through the switches.  It provides a wealth of statistics that can be used to analyze your Cisco or IBM c-Type fabric.  Telemetry Streaming allows you to use an external application such as Cisco DataCenter Network manager to sample and visualize the data that the analytics engine generates to find patterns in your performance data and identify problems or predict the likelihood of a problem occurring.


You can find an overview of both SAN Analytics and Telemetry Streaming here.  That link also includes a complete list of the hardware that SAN Analytics is supported on.


In this blog post we'll take a quick look at the more important reasons to use SAN Analytics and the Telemetry Streaming features.


Find The Slow Storage and Host Ports on the SAN

This is probably the most common use case for any performance monitoring software.  We want to identify the outlier storage or host ports in the path of slow IO transactions. In this case, slowness is defined as longer IO or the exchange completion time.  Both of these are measures of how long it takes to complete a write or read operation.  While there are several potential reasons for slow I/O, one of the more common ones is slow or stuck device ports.  A host or storage port continually running out of buffer credits is a common cause of performance issues.  SAN Analytics makes it much easier to identify these slow ports. 

 

Find The Busiest Host and Storage Ports

You can use SAN Analytics to identify the busy ports on your SAN. This enables you to monitor the busy devices and proactively plan capacity expansion to address the high utilization before impact application performance.  If you have a host that has very high port utilization, you need to know this before adding more workload to the host.  If you have storage ports that have very high port utilization, perhaps you can load balance your hosts differently to spread the load across the storage ports so that a few ports aren't busier than the rest of them.  

 

It is important to note that busy ports are not automatically slow ports.  Your busy ports may be keeping up with the current load that is placed on them.   However, if the load increases, or a storage system where all ports are busy has a link fail, the remaining ports may not have enough available capacity to meet that demand.  SAN Analytics can help identify these ports.  


Related to this is verifying that multi-pathing (MPIO) is working properly on your hosts.  SAN Analytics can help you determine if all host paths to storage are active, and if they are, whether the utilization is uniform across all of the paths.

 

Discover if Application Problems are Related to Storage Access


SAN Analytics enables you to monitor the Exchange Completion Time (ECT) for an exchange.  This is a measure of how long a command takes to complete.  An overly long ECT can be caused by a few different problems, including slow device ports.  However, if SAN Analytics is reporting a long ECT on write commands when there are no issues indicated on the SAN, this often means that the problem is inside the storage.    


Identify the Specific Problematic Hosts Connected to an NPV Port


Hypervisors such as VIOS, VMware or Hyper-V all use N-Port Virtualization to have virtual machines log into the same set of physical switch ports on a fabric.  Customers frequently also have physical hosts connecting through an NPV device.  An example of this is a Cisco UCS chassis with several hosts connecting through a fabric extender to a set of switch ports.  In these situations, getting a traffic breakdown per server or virtual HBA from just data available in the switch data collection is challenging.   It is even more so when you are trying to troubleshoot a slow drain situation.  Switch statistics can point to a physical port, but if multiple virtual hosts are connected to that port it is often difficult to determine which host is at fault.  There are a few Cisco commands that can be run to try to determine this, but they need to be run in real-time, and on a large and busy switch you can often miss when the problem is happening as the data from these commands can wrap every few seconds. 


The SAN Analytics engine collects this data on each of the separate hosts.  This gives you the ability to drill down to specific hosts in minute detail.  Once you identify a specific switch port that is slow drain, you can then use the data available in SAN Analytics to determine which of the hosts attached to that port is the culprit.  


If you want to learn more:


The Cisco and IBM C-Type Family


How IBM and Cisco are working together

No comments:

Post a Comment