Brocade Fabric Performance Impact Notification

 





Brocade Fabric Performance Impact Notification (FPIN)  was released in Broadcom FOS v9.0.  It is available on Brocade Gen6 and Gen7 switches.    This  feature enables the switch to detect issues on a fabric such as congestion or physical link issues and then then notify the affected devices that have registered for these notifications.  FPIN functions in a similar mechanism to RSCN.    RSCN enables the fabric to send  notifications to devices when a device they are zoned to is going offline.  The devices that receive these notifications can then proactively take steps such as path failover rather than have to react to a path being down.  

FPIN provides a means to notify devices of link or other issues with a connection to a fabric or a path through it.    For both RSCN and FPIN, a device must register with fabric services to receive these notifications.  The new Brocade Gen7 hardware  can send hardware  or software signal notifications.  Gen6 can only send software notification.   Both the hardware and software notifications require FOS v9.0 on the switches.

Hardware signals can be sent from the switch to the adapter in the device.  The adapter  can then decide what to do about the notification.   Software signals are sent higher up in the Fibre-Channel stack, and the adapter driver would then decide how to handle the notification.   One advantage to notifications in hardware is reaction time - the adapter can process the notifications and react more quickly than the driver can.   Another is that the hardware-based notification is a fibre-channel primitive.  This means that even if buffer credits are depleted the signal can still be delivered to the device on the other end of the link.  

Primitives are not frames so do not need buffer credits to be sent.  The software layer signal is an ELS frame, so can be affected by buffer credit depletion and other link congestion.  Whether the signal sent is hardware or software, how the devices handle the notifications is up to the vendor of the adapter.  Some may log the notification, some may take action.  The action that an adapter takes is also vendor specific.  
  • FPIN can alert devices about these events:
  • End Device Congestion
  • Device Link Integrity (CRC)
  • Frame Drops
If FPIN is enabled, these events are still monitored via MAPS.    Enabling FPIN won't change your existing MAPS configuration for the above events.   With FPIN, notificationare sent to the affected devices that register for them.  How the devices handle the notification is vendor specific.  They may just log the event or they may take other steps such as starting link recovery or slowing traffic on  a congested link and re-routing  out an un-congested port.  As a last resort, the device may shut down a troublesome link.  

Some vendors that support FPIN today are:
  • Linux Multi-Path in RHEL 8.2
  • Emulex - supports Congestion and Link Integrity  notifications on Linux
  • Marvell  - will register for FPIN and log the notifications, these could be used as a source of log data for troubleshooting
  • AIX - will register for Link Integrity and Congestion notifications
but we expect that more HBA and Storage Controller vendors will add support for FPIN in the future.  

One use case for FPIN is if a switch detects congestion on an ISL or path between devices, it could potentially notify the device sending data so that device could try  sending data down another path  without waiting for timeouts and path failover to happen. 

A common cause of congestion occurs when two devices are zoned together with a speed mismatch.  In these cases, the faster device can  throttle back  and  send data at a slower rate to the slower device.  Some caveats here are that it would be vender specific for storage systems or host adapters, and in the case of throttling data rates,  this would only work on the host side, unless a storage system could selectively throttle depending on the destination address.  

Another use case is the with link integrity issues.  If a link is accumulating CRC or  Invalid Transmission Words (ITWs) the physical link has a faulty component.  A fibre-channel cable can be bad in only one direction.  So it is possible that the device at one end of a link is not aware of any issues. The Link Integrity FPIN will notify the host adapter if a path is compromised.  The adapter can then determine whether it should try another path by having the multi-path driver fail over.  This would happen at the hardware level, long before the problem bubbled up to the software layer.  

One final note, remember that an FPIN can be sent from any device that supports it.  Potentially the storage, the host or the switch can share this information and if they were all to have the capability to re-route data based on these notifications, the SAN is that much closer to an autonomous, self-healing SAN that routes data around blockages as best it can.  

Comments

Popular posts from this blog

Troubleshooting Slow Drain Devices on Broadcom Switches

Spectrum Virtualize NPIV and Host Connectivity