Sexy HPC storage is all the rage

HPC is sexy

There is no denying it. HPC is sexy. HPC Storage is just as sexy.

Looking at the latest buzz from Super Computing Conference 2018 which happened in Dallas 2 weeks ago, the number of storage related vendors participating was staggering. Panasas, Weka.io, Excelero, BeeGFS, are the ones that I know because I got friends posting their highlights. Then there are the perennial vendors like IBM, Dell, HPE, NetApp, Huawei, Supermicro, and so many more. A quick check on the SC18 website showed that there were 391 exhibitors on the floor.

And this is driven by the unrelentless demand for higher and higher performance of computing, and along with it, the demands for faster and faster storage performance. Commercialization of Artificial Intelligence (AI), Deep Learning (DL) and newer applications and workloads together with the traditional HPC workloads are driving these ever increasing requirements. However, most enterprise storage platforms were not designed to meet the demands of these new generation of applications and workloads, as many have been led to believe. Why so?

I had a couple of conversations with a few well known vendors around the topic of HPC Storage. And several responses thrown back were to put Flash and NVMe to solve the high demands of HPC storage performance. In my mind, these responses were too trivial, too irresponsible. So I wanted to write this blog to share my views on HPC storage, and not just about its performance.

The HPC lines are blurring

I picked up this video (below) a few days ago. It was insideHPC Rich Brueckner interview with Dr. Goh Eng Lim, HPE CTO and renowned HPC expert about the convergence of both traditional and commercial HPC applications and workloads.

I liked the conversation in the video because it addressed the 2 different approaches. And I welcomed Dr. Goh’s invitation to the Commercial HPC community to work with the Traditional HPC vendors to help push the envelope towards Exascale SuperComputing.

HPC storage workloads have always been the technical and scientific applications such as Computational Fluid Dynamic (CFD), Seismic Processing in Exploration and Production, Electronic Design Automation (EDA), Wind Tunnel Simulation, Genome Sequencing and so on. With the greater attraction of AI/DL (Deep Learning), traditional HPC vendors with storage platforms such as Cray Computers (Cray acquired Seagate ClusterStor), IBM GPFS, HPE (mostly through the SGI acquisition), Lustre (through its long and convoluted history), are moving down to meet the demands of this new Commercial HPC market.

From the opposite polar, we are now seeing Enterprise storage vendors putting their hats into the Commercial HPC ring. Several have an AI spin to it, notably Pure Storage AIRI (AI Ready Infrastructure), NetApp ONTAP AI, and Data Direct Networks A3I. Panasas, a long time purveyor of Commercial HPC storage, has just released their PANFS and ActiveStor Ultra at SC2018, upping their game in this exploding market.

A couple of startups who have presented at the Storage Field Day which I have attended also had impressive parallel storage performances. Weka.io claims the fastest parallel filesystem in the world with a record benchmark, only to be topped by E8 Storage a few months later with another record benchmark. And there are many more HPC storage companies in the mix that I have not mentioned.

Even Top500, which publishes the top supercomputers in the world twice a year, has admitted that much has changed so much so that it is now harder to differentiate what is Traditional HPC and what is Commercial HPC. In a recent blog post, the lines have certainly blurred.

HPC workload anyone?

HPC storage is different. Commercial HPC workloads and the demands for certain characteristics in a storage platform cannot be equated to an Enterprise Storage platform most of the time. In my mind, 3 characteristics stand out:

  • Metadata
  • Mixed random and sequential workloads; Small & large blockss
  • Networking – High bandwidth; low latency

Metadata is crucial in HPC workloads. A lot of the workload are based on unstructured data and files. As the large piece of data or files are chunked or sharded across the scale-out HPC platforms, it is important to know where that chunk or that shard went to in the nodes of the scale-out. Thus there are usually metadata servers in the HPC infrastructure mix. Lustre has Metadata Servers and Targets which are separated from its Object Servers and Targets. Same goes for BeeGFS, a better supported Lustre lookalike. Veritas Cluster File System (ok, I have not looked much into parallel file systems for a long time) has more active and participative metadata servers in their design.

I pulled out an old SNIA document just to share the metadata server designs I described above.

The second characteristic I mentioned is the disparate mix of workloads. In operations, there could be several large and small jobs running. Each job has a mix of small random blocks reads/writes coupled with small and large blocks sequential reads. The demands of this disparate mix, married with the explosive networking demands will choke an enterprise storage platform in an instance.

When it comes to reading the large files, the distributed scale-out nature of the files must be of high throughput, typically in the tens of Gigabytes per second. This can only be achieved with a very low latency but high bandwidth network switching and perennial networking powerhouse Mellanox has been most of the HPC mix (both Traditional and Commercial) with its Infiniband and RDMA technologies.

My little story…

I was already interested in HPC in the 90s. Given the limited resources I had at the time, and lack of access to HPC software, I attempted to build a Beowulf Cluster.  It wasn’t easy because I was still on dial-up, and my home computer was still on Pentium with 2GB and 100Mbit network. I borrowed another computer from my brother-in-law, linked both computers up to attempted the impossible. I didn’t succeed, but I learned the foundation of HPC.

Meanwhile, my day job with Sun Microsystems at the time was to configure and deploy the Sun E10000 Starfire. The E10K came from one of the Cray Computer divisions. I had blast with the E10K but the price band of these systems were for large enterprises, beyond the commercial market.

Comments are welcome

Earlier this year, I wrote about HPC and its presence in Asia. HPC can be a enabler and can give Asian companies the level playing field to innovate.

As always, it was a pleasure to share and your comments are welcome.

Tagged , , , , , , , , , , , , , , , . Bookmark the permalink.

About cfheoh

I am a technology blogger with 20+ years of IT experience. I write heavily on technologies related to storage networking and data management because that is my area of interest and expertise. I introduce technologies with the objectives to get readers to *know the facts*, and use that knowledge to cut through the marketing hypes, FUD (fear, uncertainty and doubt) and other fancy stuff. Only then, there will be progress. I am involved in SNIA (Storage Networking Industry Association) and as of October 2013, I have been appointed as SNIA South Asia & SNIA Malaysia non-voting representation to SNIA Technical Council. I was previously the Chairman of SNIA Malaysia until Dec 2012. As of August 2015, I am returning to NetApp to be the Country Manager of Malaysia & Brunei. Given my present position, I am not obligated to write about my employer and its technology, but I am indeed subjected to Social Media Guidelines of the company. Therefore, I would like to make a disclaimer that what I write is my personal opinion, and mine alone. Therefore, I am responsible for what I say and write and this statement indemnify my employer from any damages.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.