Storage IO straight to GPU

The parallel processing power of the GPU (Graphics Processing Unit) cannot be denied. One year ago, nVidia® overtook Intel® in market capitalization. And today, they have doubled their market cap lead over Intel®, [as of July 2, 2021] USD$510.53 billion vs USD$229.19 billion.

Thus it is not surprising that storage architectures are changing from the CPU-centric paradigm to take advantage of the burgeoning prowess of the GPU. And 2 announcements in the storage news in recent weeks have caught my attention – Windows 11 DirectStorage API and nVidia® Magnum IO GPUDirect® Storage.

nVidia GPU

Exciting the gamers

The Windows DirectStorage API feature is only available in Windows 11. It was announced as part of the Xbox® Velocity Architecture last year to take advantage of the high I/O capability of modern day NVMe SSDs. DirectStorage-enabled applications and games have several technologies such as D3D Direct3D decompression/compression algorithm designed for the GPU, and SFS Sampler Feedback Streaming that uses the previous rendered frame results to decide which higher resolution texture frames to be loaded into memory of the GPU and rendered for the real-time gaming experience.

Windows 11 DirectStorage to the GPU. Diagram from Tweaktown article https://www.tweaktown.com/news/78869/directstorage-uses-new-algorithm-to-unlock-max-io-with-desktop-gpus/index.html

In the diagram above, you can also see that the DirectStorage API skips the CPU for the decompression function, offloading the CPU, and establishes an I/O data flow from the NVMe SSD device straight to the GPU’s memory. Bypassing the CPU removes the memory bottleneck as well as the extra hop in the I/O pipeline where the VRAM (video RAM) memory sizes of the GPU are spoken in tens of GBs, while the CPU memory sizes are in terms of MBs.

The DirectStorage implement will no doubt accelerate the quality of the gamers’ expansive and immersive experience exponentially, with faster load times, on-demand high fidelity and graphics rendering and more. Microsoft is calling out to the GPU makers like AMD®, nVidia® and Intel® to expand the adoption of this technology.

The Magnum

On the enterprise front, in a similar high level concept, the nVidia® Magnum IO GPUDirect® Storage architecture forms a new storage IO pipeline from the ultra fast NVMe storage device to the GPU via the PCIe switch as shown in the diagram below.

nVidia Magnum GPU Direct Storage

The Magnum Storage IO subsystem optimizes I/O performance and is able to deliver high throughput to keep the GPUs busy. This is done via DMA (direct memory access) copy large data blocks from the storage PCIe complex to the GPU memory complex via this GPUDirect® Storage capability. In a nutshell, storage devices are now about to asynchronously place data blocks directly into specific addresses in the GPU’s memory, without the CPU intervention. The offloading and bypass removes the CPU’s obstacle in constructing memory structures to facilitate the copy from one memory complex to another in the storage I/O pipeline.

The Magnum IO GPUDirect® Storage (MIO GDS) was finally GAed last week, providing a direct IO path to the HGX-2 GPU server, an nVidia® purpose-built supercomputer for AI, ML, DL and HPC.

The MIO GDS announcement sparked a flurry of announcements from key storage vendors. Here are a few notable ones, with more coming along the way:

[ Vast Data ] To Infinity and Beyond AI: GPUDirect Storage is happening
[WekaIO™ ] nVidia® GPUDirect® Storage plus WekaIO™ provides more than just performance
[ NetApp® ] Boost performance with nVidia® Magnum IO GPUDirect® Storage (broken link)
[ Excelero ] Direct GPU-to-Drive paths for latency sensitive applications with nVidia® Magnum IO GPUDirect® Storage and Excelero
[ DDN ] DDN boosts AI storage leadership with Exascaler 6 and expanded portfolio for Enterprise Intelligent Infrastructure

There are many more vendors lined up – Pavilion Data, ScaleFlux, Hitachi Vantara, IBM, DellEMC, Liqid and a few more.

New workloads diluting the CPU

The new generation of modern data workloads like AI and machine learning is certainly driving a straight, unobstructed path to the GPU. Decentralization and disaggregation of workloads have led to the mini balkanization of sorts, diving the once feudalistic landscape into multiple computational sub-variants. We are seeing the rise of DPUs (data processing units), SmartNICs, computational storage, FPGAs (field programmable gate arrays), TPUs (Tensor Processing Units), VPUs (vision processing units), and many other novel fuzzy terms related to the computational processing.

Just weeks ago, Intel®, the once feudal lord of this outdated fiefdom, announced IPUs (infrastructure processing units) to rival these many upstarts and complement their CPUs in offloading and bypassing certain computational workloads. How the world has changed.

Intel® GPU rising

With all these faux Nostradamus–esque tales of Intel®’s doom in the GPU market, and tepid challenged posed by the GPU-like Intel® Xeon Phi™ (discontinued in 2020) in the previous years, you would think they have given up.

Intel® is coming back into game. A recent company wide reorganization under their new CEO, Pat Gelsinger shows that Intel® is serious in regaining some mindshare in the GPU market. There is now an Accelerated Computing Systems and Graphics (AXG) business unit led by Raja Koduri.

Intel X^e architecture is the next generation GPU architecture as the CPU giant awakes from its slumber. We will see more changes in the compute processing landscape in the coming years.

Data is the catalyst of change

Data is indeed the big catalyst of change. The new workloads such as DL, ML and AI and the data processing requirements have revived the once lifeless HPC (high performance computing) sector into something sexy again. In the same mould, the new data intensive workloads have disrupted the dominant centralized computing architecture where other peripheral functions once circled around the CPU. That is no longer true.

Competitors to the CPU have found a straight storage IO path to their complex, with a myriad of technology advancements and implementations to increase efficiency and performance in every way. It is exciting to see this happening in these formative years. We are at the cusp of the next storage IO leap.