I have been reading a couple of articles over the weekend which started by placing the weights of outdated networking infrastructure slowing down AI ambitions. The 2 articles are:
- The AI Infrastructure bottleneck no one talks about (which turned out to be a not-so-subtle play for Netris, a secure multi-tenant network provisioner technology).
- Data Infrastructure: The missing link in successful AI adoption (a more subtle introduction of Indicium, an AI data services company).
I did not fully agree that networking infrastructure is the main inhibitor of AI ambitions per se. Not from the experiences and the present development in high performance networking of what I know so far. In fact, AI networking infrastructure has been growing leaps and bounds, laying down ultra-high throughput plumbing between the GPUs (inadvertently up the stack to the AI models and applications) and the data storage infrastructure.
The NVIDIA-heavy GPU compute infrastructure is of course, dominated by its own NVIDIA’s networking infrastructure. Both NVIDIA Spectrum (Ethernet) and Quantum (InfiniBand), BlueField (data processing units), ConnectX and LinkX are the mainstays of DGX Cloud, a big part of NVIDIA NCPs as well.

In fact, in one of DDN’s NCP customers, I have seen a 10-node DDN EXAscaler cluster deliver almost 1.1TB/sec read and 750GB/sec write throughput to the GPU compute cluster, out-of-the-box, all with 200Gbps networking gear.
The rise of networking in AI infrastructure
To further support my take on ultra-high performance networking for AI compute, here are a few recent interesting technology news that would change your mind.
- Ultra Ethernet Consortium (UEC) launches specification 1.0 transforming Ethernet for AI and HPC at scale – note: Ultra Ethernet (NVIDIA is also a member) will be primary competitor to NVIDIA InfiniBand.
- Broadcom scales up Ethernet with Tomahawk Ultra for low latency HPC and AI. note: Broadcom SUE (scale-up Ethernet) will be a competitor to UA Link (Ultra Accelerator Link), an up-and-coming accelerator-to-accelerator networking standard which is supposed to compete with NVIDIA NVLink and NVSwitch.
- Inside the NVIDIA Cedar module with 1.6Tbps of networking capacity.
There are many more AI networking technology developments that are making their marks to accelerate AI ambitions. I am not convinced that these are bottlenecks.
It is not just storage speed now.
I have been seeing the AI infrastructure (compute, networking, storage) vendors innovating to hyper-increase data throughput, both for Read I/Os and Write I/Os. Many have done well in the Read department. A well designed data infrastructure (storage) cluster can easily deliver 3 figure GB/sec speeds with standard 100Gbps Ethernet or higher.
Write throughput is a different story, and not all data storage vendors excel in this department. Many aggregate both read and write numbers to make the whole performance thingy look aestheticly pleasing.
But the hardest part from data delivery from data storage to the AI compute infrastructure is not throughput, but latency.
I have observed that many vendors are trying to collapse the service round trip between GPUs and the datasets, all in the name of reducing latency.
What is Storage Latency?
A Google search with AI overview (copied verbatim) reveals that “Storage latency refers to the time delay between a storage device receiving a data request and the completion of that request, encompassing both read and write operations. It’s a critical performance metric in data storage, directly impacting application responsiveness and overall system efficiency. Lower latency means faster access to data, which is crucial for applications that require quick responses, like online transactions or streaming services.“
In the galumphing charge of AI in everything technology now, latency has become a vital measurement of AI inferencing’s progress and success, from a data infrastructure point-of-view.
The AI Data Pipeline
In my work, I often bring up the AI Data Pipeline. It is about how data is sourced, procured, created and ingested into the AI data architecture, how data is prepared, enriched before the data is trained with the foundation and frontier models. Then we have to think about how trained data is optimized and enhanced before they are placed to do inferencing and go into production. Below is an AI Data Pipeline I used often in my customer presentation.
To understand performance bottlenecks, especially on the data that is fueling the AI ecosystem through the infrastructure perspective, I use data points within the AI Data Pipeline’s plumbing relative to the high performance compute elements in the ecosystem.
What are some of them doing?
I see several storage-heavy vendors getting deeper beyond just the networking infrastructure. The most obvious one is NVIDIA’s GPUDirect Storage, loosely named Magnum. I have written a little bit about Magnum before back in 2021, and given my shallow knowledge, it was an early indicator of how to use NVIDIA software to control data movement and data placement in order to achieve a higher magnitude of performance.
Techniques like data movement and data placement are not new. They have been used in Supercomputing for decades, and also in enterprise storage as well. The NetApp PAM (Performance Accelerator Module) card was one I was familiar with, and then, earlier still, Fusion-io Flash Memory module that were using intelligent data caching innovation (a form of controlling the most frequently and most recently accessed data are placed). The ZFS filesystem, uses fast storage medium to enhance its ARC (Adaptive Replacement Cache) to improve read data serving response times as well.
Another technique is to place the data (especially read data) closer to the compute. This is very much client-side data placement as I have seen in several storage vendors. When I joined DDN, I was asked to understand the Lustre Persistent Client Cache deeply, which found its way into Sunway Taihulight supercomputer (once the fastest in the world between 2016 and 2018). Recently, Hammerspace introduce Tier-0, that turns the local NVMe drives into an ultra-high performance data tier with ultra-low latency.
Weka recently introduced Neural Mesh. One of the features is the ability to control and address files with small block sizes, usually the metadata types used in AI data. This is a data placement technique (I don’t have the deep technical details yet), where small block files are optimized with greater efficiency and support. Incidently, Weka has also introduced their Augmented Memory Grid, an integration with distributed inference engine such as NVIDIA Dynamo to provide a larger key-value pair memory footprint supporting large context-aware processing.
Others like Vast Data are integrating part of the AI and Machine Learning (ML) Data Pipeline into their Data Platform, again to enhance its prowess with accelerating data and controlling how the data movements and placements work within their technology.
The use of BlueField-3 DPUs (data processing units) in vendors’ platforms like DDN Infinia, Weka on BlueField-3, Vast Data on BlueField-3 are just not to curry flavour with NVIDIA. DPUs are surely taking over some of the repetitive and resource intensive tasks of the CPUs, but it is also changing the paradigm of how data is processed. The BlueField-3 DPUs are making data more secure, more performant and tighter integration with the NVIDIA ecosystem. This, again, is a data movement and data placement technique in play, all to hit the lowest storage latency possible.
Where is the Infrastructure Pendulum now?
I have been in the infrastructure industry long enough that performance does not forever get stuck. The need to get the data, to move the data, to process the data and more swings like a pendulum among the 3 pieces of infrastructure – Compute, Network, Storage. At this point, as the voluminous footprint of data is growing exponentially, I would say Storage, or in AI-speak, Data Infrastructure is facing the greatest challenge. That is why innovative techniques of the ones I have mentioned, and more that I have not covered are about collapsing the gap of data movement and data placement to the right compute processing, up-the-stack, to the edge-core-cloud and back.
Data is heavy. I describe them as the elephant of the 3 infrastructure layers, while in the compute infrastructure, data is fleeting as birds. The networking infrastructure layer in between has gotten better, but storage professionals like myself understand data. It will be forever growing and forever demanding more. The GPUs have jumped ahead, and so has the networking pipes. Storage infrastructure is getting better as well, but it will be the intelligent data movement and data placement technologies that will separate the data storage heroes from the wanna-bes.