Posts

Showing posts from 2020

Long Distance Fibre Channel Link Tuning

Image
In this video  I talk about some of the variables involved in long distance link tuning of fibre-channel distance links.  In this blog post I'll detail some of the tools that are available.  I will also provide an example of estimating the number of buffer credits you will need.  Note that this tuning is only for fibre-channel links.  This does not apply to FCIP tunnels or circuits.   One critical piece of information that you will need to calculate buffer credits is the frame size.  Smaller frames means more of them can fit in the link, so you would need more buffer credits.  Of the variables that go into the formula, this is the only unknown.  Everything else is either known or is a constant.  Brocade has the 'portbuffershow' command that can tell you the average frame size for a link.  You would look at the Framesize columns for  TX and RX in the portbuffershow output to get the frame size.  The portbuffershow output is organized by logical switch and then by port.     O

Using the IBM Storage Insights Pro Grouping Features

Image
  I recently posted  this post  on how you can help IBM Storage Support help you by ensuring you are utilizing the full monitoring features available on your storage systems and switches.    You should also have at least the free version of IBM Storage Insights installed.   If you have Storage Insights Pro or Storage Insights for Spectrum control, there are some additional steps that you should take that will benefit both you and the IBM Support team resolve your problems as quickly as possible.  IBM Storage Insights Pro and Storage Insights for Spectrum Control come with some powerful features for grouping and  organizing storage resources.  These features are found under the Groups menu.   You can organize your storage resources into Applications, Departments and General Groups.    There is a hierarchy to the organization of resources.   Departments can contain sub-departments, Applications or General Groups.  Applications can contain hosts or other applications.  General Groups can

Help IBM Storage Support Help You

Image
        I had a client recently ask me what was the most effective thing his company could do to get me the data that would be the most helpful in troubleshooting problems in his solution.  This was after we were unable to provide a definitive root cause to a problem that occurred intermittently in his solution.   He had a fairly simple fabric that consisted of two 96-port switches, a few IBM Storage Systems and 30 or so hosts.   His problem was an issue with performance on the hosts.  At the time the best I was able to tell him was data indicated a slight correlation between host read activity and a performance problem but I was not able to confirm anything with certainty.       My answer was simple:  configure better event detection and system logging.  This is something I teach as a best-practice at IBM Technical University.   I also suggested that his company install at least the free version of IBM Storage Insights.   Without a performance monitoring tool, troubleshooting performa

IBM Announces IBM SANnav

Image
IBM Announced IBM SANnav today.  You can register for a webinar to learn more about SANnav  here.      SANnav is a next-generation SAN management application.  It was built from the ground up with a simple, browser-based user interface.   It can streamline common workflows, such as configuration, zoning, deployment, troubleshooting, and reporting. The modernized GUI can improve operational efficiency by enablog enhanced monitoring capabilities, faster troubleshooting, and advanced analytics.  Key features and capabilities include: C onfiguration management : You can use policy-based management to apply consistent configurations across the switches in your fabrics.  SANnav also makes zoning devices easier by providing a more intuitive interface than previous management products.   Dashboards:   You can see  at-a-glance views and summary health scores for fabrics, switches, hosts, and targets that may be contributing to performance issues within the network. You can instantly navigate to

Integrating Broadcom Flow Vision Rules with MAPS

Image
Sound monitoring and syslogging practices are the first and sometimes most important step in troubleshooting.  They also the most overlooked as they must be  configured before a problem happens.  If system logging is not configured before a problem happens, valuable information is lost.  Broadcom has two important features that you can use to monitor the health of your Broadcom fabrics and alert you when problems are detected:  Flow Vision (the monitoring) and Monitoring And Alerting Policy Suite (MAPS), which can both monitor and alert if it detects error conditions.  In this post I'll provide a brief overview of each feature and then we'll see how we can integrate Flow Vision into MAPs to provide a comprehensive monitoring and alerting solution.  Flow Vision Flow Vision provides a detailed  view of the traffic between devices on your fabrics.  It captures traffic for analysis to find bottlenecks, see excessive bandwidth utilization, and look at other similar flow-based fabric

Cisco SAN Analytics and Telemetry Streaming - Why Should I Use Them?

Image
Are you sometimes overwhelmed by performance problems on your Storage Network?  Do you wish you had better data on how your network is performing?  If you answered yes to either of these questions, read on to find out about Cisco SAN Analytics and Telemetry Streaming.     The Cisco SAN Analytics engine is available on Cisco 32Gbps and faster MDS 9700 series port port modules and the 32 Gbps standalone switches.   This engine is constantly sampling the traffic that is running through the switches.  It provides a wealth of statistics that can be used to analyze your Cisco or IBM c-Type fabric.  Telemetry Streaming allows you to use an external application such as Cisco DataCenter Network manager to sample and visualize the data that the analytics engine generates to find patterns in your performance data and identify problems or predict the likelihood of a problem occurring. You can find an overview of both SAN Analytics and Telemetry Streaming here .   That link also includes a complet

Implementing a Cisco Fabric for Spectrum Virtualize Hyperswap Clusters

Image
 I wrote  this previous post  on the general requirements for SAN Design for Spectrum Virtualize Hyperswap and Stretched clusters.  In this  follow-on post, we'll look at a sample implementation on a Cisco or IBM C-type fabric.  While there are several variations on implementation (FCIP vs Fibre-Channel ISL is one example) the basics shown here can be readily adapted to any specific design.  This implementation will also show you how to avoid one of the most common errors that IBM SAN Central sees on Hyperswap clusters - where the ISLs on a Cisco private VSAN are allowed to carry traffic for multiple VSANs. We will implement the below design, where the public fabric is VSAN 6, and the private fabric is VSAN 5. The below diagram is a picture of one of two redundant fabrics.  The quorum that is depicted can be either an IP quorum or a third-site quorum.   For the purposes of this blog post, VSAN 6 has already been created and has devices in it.  We'll be creating VSAN 5, adding t

Physical Switch SAN Implementation for an SVC Hyperswap Cluster

Image
In February 2020 I wrote this post   on the supported SAN design for SVC and Spectrum Virtualize Hyperswap clusters.  In that post I covered some of the problems that arise with improper SAN design for SVC clusters in a Hyperswap configuration.   The requirement at it's most basic when using Hyperswap is to have completely separate fabrics for private traffic, where the private traffic is used for only the inter-node communication within the cluster and there are one or more public fabrics for everything else.  There are various ways that SANs can be implemented to meet that requirement.    This is one in a series of blog posts that will discuss some of the options for fabric design within that framework and provide some implementation details on Cisco and Brocade fabrics.  I will also show you some of the common mistakes that are made in the SAN implementation.  As with that post, while I may only reference SVC in this series (for exanple, the diagram below depicts an SVC clust

To Trunk or Not To Trunk, That Is the Question

Image
Separate ISLs .  Trunked ISLs I have had several conversations recently with customers who have asked the question that when they have multiple inter-switch links (ISLs) between switches should those links be aggregated into a single logical link.  Above we have the two possible configurations for links between switches. On a Broadcom switch these are called trunks. On a Cisco switch these are called port-channels.   The word 'trunk' has a different meaning on a Cisco switch.  For Cisco, an inter-switch link (ISL)  is trunking when it is carrying traffic for multiple VSANs.  This applies to both single link ISLs and to port-channels.  If it is only carrying traffic for a single VSAN it is not trunking.  This blog post uses 'trunk' to mean link aggregation.  The first image above depicts three ISLs configured as separate, standalone ISLs.  The second image depicts the three links aggregated as a single logical link.  When link aggregation is configured,

SAN Design Best-Practices for IBM SVC and FlashSystem Stretched and Hyperswap Clusters

Image
I recently worked with a customer who had their SAN implemented as depicted in the diagram for a Hyperswap V7000 cluster.   An SVC or FlashSystem cluster that is configured for Hyperswap has half of the nodes at one site, and half of the nodes at the other site.  The I/O groups are configured so that nodes at each site are in the same I/O group.  In the example from the diagram, the nodes at Site 1 were in one I/O group, the nodes at the other were in another I/O group.  A stretched cluster also has the nodes in a cluster at two sites, however each I/O group is made up of nodes from each site.  So for our diagram below,  in a Stretched configuration, a node from Site 1 and a node from Site 2 would be in an I/O Group.   From the diagram we see that each site had two V7000 nodes.  Each node had connections to two switches at each site.  The switches were connected as pictured in the diagram to create two redundant fabrics that spanned the sites.   The customer had hosts and third-p