When I wrote this article “Let’s smoke this storage peace pipe” 5 years ago, I quoted:
“NVMe® and NVM®eF‰, as it evolves, can become the Great Peacemaker and bringing both divides and uniting them into a single storage fabric.“
I envisioned NVMe® and NVMe®oF™ setting the equilibrium at the storage architecture level, finishing the great storage fabric into one. This balance in the storage ecosystem at the storage interface specifications and language-protocol level has rapidly unifying storage today, and we are already seeing the end-to-end NVMe paths directly from the PCIe bus of one host to another, via networks over Ethernet (with RoCE, iWARP, and TCP flavours) and Fibre Channel™. Technically we can have an end point device, example a tablet, talking the same NVMe language to its embedded storage as well as a cloud NVMe storage in an exascale storage far, far away. In the past, there were just too many bridges, links, viaducts, aqueducts, bypasses, tunnels, flyovers to cross just to deliver a storage command, or a data in a formats, encased and encoded (and decoded) in so many different ways.
Simple basics of NVMe®
SATA (Serial Attached ATA) and SAS (Serial Attached SCSI) are not optimized for solid state devices. besides legacy stuff like AHCI (Advanced Host Controller Interface) in SATA, and archaic SCSI-3 primitives in SAS, NVM® has so much to offer. It can achieve very high bandwidth and support 65,535 I/O queues, each with a queue depth of 65,535. The queue depth alone is a massive jump compared to SAS which has a queue depth limit of 256.
A big part of this is how NVMe® handles I/O processing. It has a submission queue (SQ) and a completion queue (CQ), and together they are know as a Queue Pair (QP). The NVMe® controller handles tens of thousands at I/Os (reads and writes) simultaneously, alerted to switch between each SQ and CQ very quickly using the MSI or MSI-X interrupt. Think of MSI and MSI-X as a service bell, a hardware register that informs the NVM® controller when there are requests in the SQ, and informs the hosts that there are completed requests in the CQ. There will be plenty of “dings” by the MSI-X service register but the NVMe® controller can perform it very well, with some smart interrupt coalescing.
NVMe® 1.1, as I recalled, used to be have 3 admin commands and 10 base commands, which made it very lightweight compared to SCSI-3. However, newer commands were added to NVMe® 2.0 specifications included command sets fo key-value operations and zoned named space.
NVMe® – the milestones
Many things that are happening the storage world today was concocted and devised years ago. The industry standards people – NVMe®, INCIT™/T11, among many – already knew the impact of solid state devices will have. Since its inception, we have seen numerous development upgrades to the protocol, each bringing NVMe® to its objectives and beyond.
The NVMe®oF™ networking siblings
NVMe®oF™ (Non-volatile Memory Express over Fabrics) is a natural progression of NVMe. There were already support for NVMe® over Ethernet (notably the iWARP and RoCE v2 flavours) early on. NVMe® of Fibre Channel™ was right along side with the Ethernet specifications and development. As of version 1.4, NVMe® over TCP was ratified as well,
Golden age of SAN to rise again
NVMe®oF™, specifically NVMe® over TCP is gaining a lot of tractions. Several large storage vendors have thrown their weight behind NVMe® over TCP, just like they did over NVMe® over Fibre Channel™. This is SAN (Storage Area Network) once again, this time carrying the NVMe® payload instead of SCSI. These bring forth FC-SAN and IP-SAN in a completely new light.
With disaggregation and composability storage technologies maturing quickly, and Infrastructure-as-Code with Kubernetes orchestration of all things containers and all things cloud native, think of NVMe® storage resources across any networks, anywhere, on every device, all the time on a mega, mega scale, with speed. Think like LinkedIn®, or Twitter®, handling billions and billions of datasets on all their storage resources across the world, delivered with NVMe® in incredible scale and performance. NVMe® will enable all this, seen as a concept in the diagram of 2019 below:
This disaggregated NVMe®oF™ deployment model is almost already here.
What is next?
Over the course of the next 2-3 years, we will see NVMe completely dominate the storage media landscape. This was reported by IDC Worldwide Solid State Drive Forecast 2020-2024, Doc #US4590920, summarized in the chart below:
What makes NVMe so exciting is it operates at the PCIe bus layer. Therefore, NVMe has to ability to merge and communicate bytes, pages and blocks at the CPU (and other burgeoning processors like DPUs, IPUs, xPUs, GPUs, VPUs, SmartNICs etc) and memory complex in a same common dialect. Along with CXL 1.1./2.0, PCIe 5.0, and NVMe 2.0, I am seeing the possibility of a memory cloud, something I have been saying for almost a decade. Memory, the last bastion for storage, is about to be amalgamated into the storage layer of computing.
This mixture of this memory cloud concoction is almost ready. I cannot wait to see what the future of storage holds.
Nice article! It’s great to be able to tell the story of the progression of these standards.