Posts

Using Performance Data To See Network Problems

Image
I frequently work cases where the problem is a performance problem.  Either an entire system or an application is slow enough that users are affected.    Another frequent performance problem is with storage-side replication.  In these cases replication is not able to keep up with the production workload and RPOs are not being met.   Replication is done most commonly between sites, though I have worked a few cases with same-site (or campus) replication.     Whether you are using IBM DS8000 PPRC/Global Mirror, IBM SVC/FlashSystem Global Mirror (GM) or Global Mirror with Change Volumes (GMCV) you expect that the replicated data will be current up to a certain point in time behind the production data.  This is your Recovery Point Objective (RPO).    Your RPO is how current the replicated data needs to be.  For data that doesn't change often, an RPO of 30 minutes or an hour might be enough.  For data that frequently changes, an RPO of a few minutes might be required.  For weekly reporti

Troubleshooting Slow Drain Devices on Broadcom Switches

Image
  Slow drain devices are one of the more common problems on storage networks.  They can occur for a variety of reasons.  For a refresher on how they can affect your storage network you should watch  this video .    In this blog post I will go through the basic steps to troubleshoot a slow drain device on a Broadcom fabric.    I will be using command line output from switches.  The CLI format lends itself better to a blog post more readily than screen shots from a GUI, and the commands are consistent across different versions of FOS.    SANnav is a huge change from Brocade or IBM Network Advisor and the screens would look quite different between the two. The first command we will be using is porterrshow.    The above output has been truncated for the ports we are interested in. The counters of interest are in the c3timeout column.  You can see that there are 2 sub-columns, 'tx and 'rx'.   'tx' means the switch is trying to send frames to the device attached to that p

Spectrum Virtualize NPIV and Host Connectivity

Image
 A while ago I wrote  this post  as an introduction to the Spectrum Virtualize NPIV feature.  In this follow-up post I thought I would focus more on host connectivity and the effects of NPIV.    You can watch a quick review of the NPIV feature in this IBM Systems Rockstar video:      NPIV has 3 modes: 1.  Disabled - this mode means that hosts cannot connect to the virtual World Wide Port Names  (WWPNs) on the Spectrum Virtualize cluster, regardless of the fabric zoning 2.  Transitional - this mode means hosts can connect to either the physical or virtual WWPNs on the cluster.  If a host is zoned to both, it will connect to both.   Transitional mode is meant to only be used while you are migrating to NPIV mode and rezoning your hosts to the virtual WWPNs.  It is not meant to be used permanently or even long-term.   3.  Enabled - this means hosts can only connect to the virtual WWPNs.  If they are zoned to the physical WWPNs the connection will be listed as 'blocked' in the devic

IBM Spectrum Virtualize Safeguarded Copy

Image
  Several months ago I was asked by a local organization here if I could recover files from a system that had been encrypted by a ransomware attack.  After looking at the hard drive in the system and doing some research, I told the organization that I could not.   It did not have a backup of the files, at least not a recent one.  The most critical  data loss for this organization was financial records.   It took a few months and a lot of work to recover most of the missing records.     Had the organization done something as simple as periodically plug in a USB drive, run a backup and then remove the drive, that would have saved them a lot of work.   The USB drive is somewhat of an immutable copy of the data, at least as long as it is not plugged into the computer while the computer is still infected.   However, a USB-attached drive doesn't really scale well  at the enterprise level, and it is not a true immutable copy, since if it is plugged back into a computer that is infected, i

Brocade Fabric Performance Impact Notification

Image
  Brocade Fabric Performance Impact  Notification  (FPI N )  was released in Broadcom FOS v9.0.  It is available on Brocade Gen6 and Gen7 switches.    This  feature enables the switch to detect issues on a fabric such as congestion or physical link issues and then then notify the affected devices that have registered for these notifications.  FPI N  functions in a similar mechanism to RSCN.    RSCN enables the fabric to send  notifications to devices when a device they are zoned to is going offline.  The devices that receive these notifications can then proactively take steps such as path failover rather than have to react to a path being down.   FPIN provides a means to notify devices of link or other issues with a connection to a fabric or a path through it.    For both RSCN and FPI N , a device must register with fabric services to receive these notifications.  The new Brocade Gen7 hardware  can send hardware  or software signal notifications.  Gen6 can only send software notificati

Long Distance Fibre Channel Link Tuning

Image
In this video  I talk about some of the variables involved in long distance link tuning of fibre-channel distance links.  In this blog post I'll detail some of the tools that are available.  I will also provide an example of estimating the number of buffer credits you will need.  Note that this tuning is only for fibre-channel links.  This does not apply to FCIP tunnels or circuits.   One critical piece of information that you will need to calculate buffer credits is the frame size.  Smaller frames means more of them can fit in the link, so you would need more buffer credits.  Of the variables that go into the formula, this is the only unknown.  Everything else is either known or is a constant.  Brocade has the 'portbuffershow' command that can tell you the average frame size for a link.  You would look at the Framesize columns for  TX and RX in the portbuffershow output to get the frame size.  The portbuffershow output is organized by logical switch and then by port.     O

Using the IBM Storage Insights Pro Grouping Features

Image
  I recently posted  this post  on how you can help IBM Storage Support help you by ensuring you are utilizing the full monitoring features available on your storage systems and switches.    You should also have at least the free version of IBM Storage Insights installed.   If you have Storage Insights Pro or Storage Insights for Spectrum control, there are some additional steps that you should take that will benefit both you and the IBM Support team resolve your problems as quickly as possible.  IBM Storage Insights Pro and Storage Insights for Spectrum Control come with some powerful features for grouping and  organizing storage resources.  These features are found under the Groups menu.   You can organize your storage resources into Applications, Departments and General Groups.    There is a hierarchy to the organization of resources.   Departments can contain sub-departments, Applications or General Groups.  Applications can contain hosts or other applications.  General Groups can