Supercharging Ethernet … with a PAUSE

It’s been a while since I wrote. I had just finished a 2-week stint in Melbourne, conducting 2 Data ONTAP classes and had a blast.

But after almost 3 1/2 months of doing little except teaching NetApp classes, the stint is ending. I wanted it that way, to take a break and also to take on a new challenge. I will be taking on a job with Hitachi Data Systems, going back to the industry that I have termed the “Wild, wild west”. After a 4 1/2-year hiatus, I think that industry still behaves the way it is .. brash, exclusive, rich! The oligarchy of the oilmen are still laughing their way to the banks. And it will be my job to sell storage (and cloud) solutions to them.

In my Netapp (and EMC) engagements in the past 6 months, I have seen the greater adoption of iSCSI over Fibre Channel, and many has predicted that 10Gigabit Ethernet will be the infliction point where iSCSI can finally stand shoulder-to-shoulder with Fibre Channel. After all, 10 Gigabit/sec is definitely faster than 8 Gigabit/sec Fibre Channel, right? WRONG! (I am perfectly aware there is a 16 Gigabit/sec Fibre Channel, but can’t you see I am trying to start an argument here?)

Delivering SCSI data load over iSCSI on 10 Gigabit/sec Ethernet does not necessarily mean that it would be faster than delivering the same payload over 8 Gigabit/sec Fibre Channel. This statement can be viewed in many different ways and hence the favourite IT reply would be … “It depends“.

I would leave this performance argument for another day but today we are going to talk about some of the key additions to supercharge 10 Gigabit Ethernet for data delivery in storage networking capacity. In addition, 10 Gigabit Ethernet is the primary transport for Fibre Channel over Ethernet (FCoE) and it is absolutely critical that 10 Gigabit Ethernet must be close to as reliable as Fibre Channel for data delivery in a storage network.

Ethernet is a non-deterministic protocol, and therefore, its delivery result is dependent on many factors. Likewise 10 Gigabit Ethernet has inherited part of that feature. The delivery of data over Ethernet can be lossy, i.e. packets can get lost and the upper layer application protocols will have to respond to detecte the dropped packets and to ensure lost packets are redelivered to complete the consignment. But delivering data in a storage network cannot be lossy and in most cases of SANs, the requirement is to have the data arrive in the sequence they were delivered. The SAN fabric (especially with the common services of Layer 3 of the FC protocol stack) and the deterministic nature of Fibre Channel protocol were the reasons many has relied on Fibre Channel SAN technology for more than a decade. How can 10 Gigabit Ethernet respond?

Here are a few ways and it all starts with Data Center Bridging (DCB). 4 very important features of DCB are worth mentioning. They are:

  • Priority-based Flow Control (PFC) – defined IEEE 802.1Qbb
  • Enhanced Transmission Selection (ETS) -defined IEEE 802.1Qaz
  • Congestion Notification (CN) – defined IEEE 802.1Qau
  • DataCenter Bridging Exchange (DCBX) which works in conjunction with Link Layer Discovery Protocol (LLDP). LLDP is defined IEEE 802.1AB

In this part one of “Supercharging Ethernet”, let’s discuss about the most important feature in becoming lossless Ethernet, which is Priority-based Flow Control (PFC). The objective is to address the reliability at the Layer 2 Ethernet.

As discussed by many notes and document, PFC has the ability to send a PAUSE frame from the receiving Ethernet transceiver to the transmitting Ethernet transceiver. Building upon the Ethernet IEEE 802.3x PAUSE frame and its semantics, PFC extends it with the introduction of multiple classes of service (COS) within the IEEE 802.3x PAUSE control frame. This is the IEEE 802.1Qbb specification, the basis of PFC and it overcomes the weakness of the initial IEEE 802.3x frame and semantics of multiplexing. The diagram below shows how the PAUSE frame works with the 8 different COSes:

The differences of the 2 types of PAUSE control frame, IEEE 802.3x and IEEE 802.1Qbb are shown below:

From the diagram above, the Time (Class 0), Time (Class 1) … Time (Class 7) segment is a 2-byte value. The value is the amount of time unit (specified in quanta) required to send 512 bits at the current network speed. As the quanta reduces to 0, the link is UNPAUSED and transmission is resumed. However, rather than “predicting the potential of buffer overflow at the receiver’s end” and when the quanta will be reduced to zero, most implementations will take a more explicit and deterministic approach to unpause. The transmitter will receive a notice that is similar to XON/XOFF to continue the traffic flow.

There are many factors that determine the buffer’s parameters at the receiver. This include factors such as

  • Maximum Transfer Unit (MTU) of the transmitting end of the receiver because this reduces the receiving queue and moving the quanta closer to zero
  • Speed of the cable
  • Transceiver latency caused by inefficiencies of the technology and the material science that goes into the making of the component
  • Response time of the transmitter
  • MTU of the sending end of the transmitter

The PAUSE/UNPAUSE feature can be juxtaposed with the Fibre Channel’s buffer credits because both of them are similar in objectives but differ in implementations. How do they work together since the whole reason for PFC is to run Fibre Channel over Ethernet (FCoE)?

Buffer-to-buffer credits (BBC), according to Wikipedia, is a flow control method deployed by Fibre Channel to represent the number of frames a Fibre Channel logical port’s buffer can accept. When the port sends a frame, the BBC value is decremented by 1; when the ports receives a frame, the BBC value is incremented by 1. When the value is zero, the port cannot send, hence stopping transmission until the BBC value increases by 1. The sender-receiver are constantly in communication with each other to ensure that the BBC value is known to both sides.

The PFC frames operate at a lower layer, replacing the lower layer of the Fibre Channel protocol stack, hence the whole model of PFC works well with BBC, until a certain point. The lossless nature of PFC breaks down as the length of the cable goes longer distance and it is unable to guarantee the continuation of “losslessness” due the the factors listed above.

It is indeed interesting to know that Priority-based Flow Control (PFC) is implemented to ensure data is guaranteed delivery and data does not get lost in a storage networking context. And this is important as we merge Fibre Channel into 10Gigabit Ethernet to implement FCoE.

I hope to continue to learn and share about the other 3 features of Data Center Bridging in my coming blogs. There’s plenty to learn.

About cfheoh

I am a technology blogger with 20+ years of IT experience. I write heavily on technologies related to storage networking and data management because that is my area of interest and expertise. I introduce technologies with the objectives to get readers to *know the facts*, and use that knowledge to cut through the marketing hypes, FUD (fear, uncertainty and doubt) and other fancy stuff. Only then, there will be progress. I am involved in SNIA (Storage Networking Industry Association) and as of October 2013, I have been appointed as SNIA South Asia & SNIA Malaysia non-voting representation to SNIA Technical Council. I was previously the Chairman of SNIA Malaysia until Dec 2012. As of August 2015, I am returning to NetApp to be the Country Manager of Malaysia & Brunei. Given my present position, I am not obligated to write about my employer and its technology, but I am indeed subjected to Social Media Guidelines of the company. Therefore, I would like to make a disclaimer that what I write is my personal opinion, and mine alone. Therefore, I am responsible for what I say and write and this statement indemnify my employer from any damages.
Bookmark the permalink.

One Response to Supercharging Ethernet … with a PAUSE

  1. Pingback: Boosting Solid States beyond SATA | Storage Gaga

Leave a Reply

Your email address will not be published. Required fields are marked *