Brocade Fabric Performance Impact Notification (FPIN) was released in Broadcom FOS v9.0. It is available on Brocade Gen6 and Gen7 switches. This feature enables the switch to detect issues on a fabric such as congestion or physical link issues and then then notify the affected devices that have registered for these notifications. FPIN functions in a similar mechanism to RSCN. RSCN enables the fabric to send notifications to devices when a device they are zoned to is going offline. The devices that receive these notifications can then proactively take steps such as path failover rather than have to react to a path being down.
FPIN provides a means to notify devices of link or other issues with a connection to a fabric or a path through it. For both RSCN and FPIN, a device must register with fabric services to receive these notifications. The new Brocade Gen7 hardware can send hardware or software signal notifications. Gen6 can only send software notification. Both the hardware and software notifications require FOS v9.0 on the switches.
Hardware signals can be sent from the switch to the adapter in the device. The adapter can then decide what to do about the notification. Software signals are sent higher up in the Fibre-Channel stack, and the adapter driver would then decide how to handle the notification. One advantage to notifications in hardware is reaction time - the adapter can process the notifications and react more quickly than the driver can. Another is that the hardware-based notification is a fibre-channel primitive. This means that even if buffer credits are depleted the signal can still be delivered to the device on the other end of the link.
Primitives are not frames so do not need buffer credits to be sent. The software layer signal is an ELS frame, so can be affected by buffer credit depletion and other link congestion. Whether the signal sent is hardware or software, how the devices handle the notifications is up to the vendor of the adapter. Some may log the notification, some may take action. The action that an adapter takes is also vendor specific.
FPIN can alert devices about these events:
End Device Congestion
Device Link Integrity (CRC)
If FPIN is enabled, these events are still monitored via MAPS. Enabling FPIN won't change your existing MAPS configuration for the above events. With FPIN, notifications are sent to the affected devices that register for them. How the devices handle the notification is vendor specific. They may just log the event or they may take other steps such as starting link recovery or slowing traffic on a congested link and re-routing out an un-congested port. As a last resort, the device may shut down a troublesome link.
Some vendors that support FPIN today are:
Linux Multi-Path in RHEL 8.2
Emulex - supports Congestion and Link Integrity notifications on Linux
Marvell - will register for FPIN and log the notifications, these could be used as a source of log data for troubleshooting
AIX - will register for Link Integrity and Congestion notifications
but we expect that more HBA and Storage Controller vendors will add support for FPIN in the future.
One use case for FPIN is if a switch detects congestion on an ISL or path between devices, it could potentially notify the device sending data so that device could try sending data down another path without waiting for timeouts and path failover to happen.
A common cause of congestion occurs when two devices are zoned together with a speed mismatch. In these cases, the faster device can throttle back and send data at a slower rate to the slower device. Some caveats here are that it would be vender specific for storage systems or host adapters, and in the case of throttling data rates, this would only work on the host side, unless a storage system could selectively throttle depending on the destination address.
Another use case is the with link integrity issues. If a link is accumulating CRC or Invalid Transmission Words (ITWs) the physical link has a faulty component. A fibre-channel cable can be bad in only one direction. So it is possible that the device at one end of a link is not aware of any issues. The Link Integrity FPIN will notify the host adapter if a path is compromised. The adapter can then determine whether it should try another path by having the multi-path driver fail over. This would happen at the hardware level, long before the problem bubbled up to the software layer.
One final note, remember that an FPIN can be sent from any device that supports it. Potentially the storage, the host or the switch can share this information and if they were all to have the capability to re-route data based on these notifications, the SAN is that much closer to an autonomous, self-healing SAN that routes data around blockages as best it can.
Zoning Basics Before I talk about some zoning best-practices, I should explain two different types of zoning and how they work. There are two types of zoning: WWPN Zoning and Switch-Port Zoning World-wide Port Name (WWPN) Zoning WWPN zoning is also called "soft" zoning and is based off the WWPN that is assigned to a specific port on a fibre-channel adapter. The WWPN serves a similar function as a MAC address does on an ethernet adapter. WWPN-based zoning uses the WWPN of devices logged into the fabric to determine which device can connect to which other devices. Most fabrics are zoned using WWPN zoning. It is more flexible than switch-port zoning - a device can be plugged in anywhere on the SAN (with some caveats beyond the scope of this blog post) and the device can connect to the other devices it is zoned to. It has one distinct advantage over Switch-Port based zoning, which is that zoning can always be specified on a single WWPN level. Switch-Port Z
I recently worked with a customer who had their SAN implemented as depicted in the diagram for a Hyperswap V7000 cluster. An SVC or FlashSystem cluster that is configured for Hyperswap has half of the nodes at one site, and half of the nodes at the other site. The I/O groups are configured so that nodes at each site are in the same I/O group. In the example from the diagram, the nodes at Site 1 were in one I/O group, the nodes at the other were in another I/O group. A stretched cluster also has the nodes in a cluster at two sites, however each I/O group is made up of nodes from each site. So for our diagram below, in a Stretched configuration, a node from Site 1 and a node from Site 2 would be in an I/O Group. From the diagram we see that each site had two V7000 nodes. Each node had connections to two switches at each site. The switches were connected as pictured in the diagram to create two redundant fabrics that spanned the sites. The customer had hosts and third-p
In this video I talk about some of the variables involved in long distance link tuning of fibre-channel distance links. In this blog post I'll detail some of the tools that are available. I will also provide an example of estimating the number of buffer credits you will need. Note that this tuning is only for fibre-channel links. This does not apply to FCIP tunnels or circuits. One critical piece of information that you will need to calculate buffer credits is the frame size. Smaller frames means more of them can fit in the link, so you would need more buffer credits. Of the variables that go into the formula, this is the only unknown. Everything else is either known or is a constant. Brocade has the 'portbuffershow' command that can tell you the average frame size for a link. You would look at the Framesize columns for TX and RX in the portbuffershow output to get the frame size. The portbuffershow output is organized by logical switch and then by port. O