[Disclosure: I was invited by Dell Technologies as a delegate to their Dell Technologies World 2019 Conference from Apr 29-May 1, 2019 in the Las Vegas USA. Tech Field Day Extra was an included activity as part of the Dell Technologies World. My expenses, travel, accommodation and conference fees were covered by Dell Technologies, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]
Deep Learning, Neural Networks, Machine Learning and subsequently Artificial Intelligence (AI) are the new generation of applications and workloads to the commercial HPC systems. Different from the traditional, more scientific and engineering HPC workloads, I have written about the new dawn of supercomputing and the attractive posture of commercial HPC.
Don’t be idle
From the business perspective, the investment of HPC systems is high most of the time, and justifying it to the executives and the investors is not easy. Therefore, it is critical to keep feeding the HPC systems and significantly minimize the idle times for compute, GPUs, network and storage.
However, almost all HPC systems today are inflexible. Once assigned to a project, the resources pretty much stay with the project, even when the workload processing of the project is idle and waiting. Of course, we have to bear in mind that not all resources are fully abstracted, virtualized and software-defined whereby you can carve out pieces of the hardware and deliver a percentage of that resource. Case in point is the CPU, where you cannot assign certain clock cycles of CPU to one project and another half to the other. The technology isn’t there yet. Certain resources like GPU is going down the path of Virtual GPU, and into the realm of resource disaggregation. Eventually, all resources of the HPC systems – CPU, memory, FPGA, GPU, PCIe channels, NVMe paths, IOPS, bandwidth, burst buffers etc – should be disaggregated and pooled for disparate applications and workloads based on demands of usage, time and performance.
Hence we are beginning to see the disaggregated HPC systems resources composed and built up the meet the diverse mix and needs of HPC applications and workloads. This is even more acute when a AI project might grow cold, but the training of AL/ML/DL workloads continues to stay hot
Liqid the early leader in Composable Architecture
At Tech Field Day Extra Dell Technologies World 2019, the delegates and I were introduced to Liqid Inc.
The video above introduces Liqid Composable Dynamic Infrastructure (CDI) quickly and fellow delegate, Adam Post wrote a great piece about Liqid technology and architecture.
Conjuring up the magic
The most interesting bits of the Liqid Composable Architecture technology for me are the Liqid Grid and the Liqid Command Center.
From the diagram below, the Liqid Grid is the high bandwidth (upto to 192GB/sec) and low latency (~150 nanoseconds) managed fabric switch. It leverages on PCIe 3 & 4 as well as high bandwidth Ethernet and Infiniband to deliver the Composable Dynamic Infrastructure (CDI) architecture with the multiples of disaggregated HPC resources such as compute, storage, networking and graphics.
The Liqid Command Center is the orchestrator, the composer of “bare-metal servers” for various required workloads and applications.
The PCIe leveler
In the whole scheme of things, the common denominator is PCIe (Peripheral Component Interconnect Express). Every server has PCIe, the low-level, high speed serial bus interface on every system board. It connects and drives NVMe interfaces, GPU cards, networking and many host-to-device, device-to-device communications.
And there are a lot of new developments happening at the low-level interconnects landscape, usually away from the views of applications’ level, and will only excite storage geeks like me. Although these are not my strength at this point, I have followed developments of the Open Fabrics Alliance (OFA), and the herding of several system interconnects – CCIX (Cache Coherent Interconnect for Accelerators), Gen-Z and OpenCAPI (Open Coherent Accelerator Processor Interface). Adding to the growing tribe of interconnect links and protocols are nVidia NVLink and Intel CXL (Compute Xpress Link).
The intention of these low-level interconnect specifications and protocols is to have tighter integration between devices, especially between processors and accelerators such as FPGAs (Field Programmable Gate Arrays) and SOCs (System on Chip), and also between DRAMs and non-volatile memory such as NV-DIMMs.
The Liqid advantage
Liqid is clearly the early market innovator for Composable Infrastructure. It has moved ahead of the pack as the open, software-defined, unified multi-fabric composable infrastructure platform, delivering a simple-to-use, high-bandwidth, low latency architecture to meet the mixed and diverse applications and workloads of the commercial HPC marketspace.
And the time is apt. The requirements and the demands of commercial HPC applications and workloads are rising, and keeping system resources utilization at the optimum level not only justifies the HPC investment but also ensure that there is a smooth transition of the deep learning/machine learning data pipeline. And this “non-blocking”, minimal wait-time is important to keep the entire DL/ML flow chugging forward with the right amount of system resources at each stage.
I was excited to see that Liqid has opened up the composable infrastructure market, challenging HPE Synergy. It is still a nascent market, and one which will attract more challengers in the years to come. But Liqid’s CEO, Sumit Puri is confident that Liqid is ready to take on the world.
World domination? You bet.
Pingback: Scaling New HPC With Composable Architecture - Tech Field Day
Pingback: Dell Technologies World 2019 – Wrap-up and Link-o-rama | PenguinPunk.net