Storage Performance Considerations for AI Data Paths

The hype of Deep Learning (DL), Machine Learning (ML) and Artificial Intelligence (AI) has reached an unprecedented frenzy. Every infrastructure vendor from servers, to networking, to storage has a word to say or play about DL/ML/AI. This prompted me to explore this hyped ecosystem from a storage perspective, notably from a storage performance requirement point-of-view.

One question on my mind

There are plenty of questions on my mind. One stood out and that is related to storage performance requirements.

Reading and learning from one storage technology vendor to another, the context of everyone’s play against their competitors seems to be  “They are archaic, they are legacy. Our architecture is built from ground up, modern, NVMe-enabled“. And there are more juxtaposing, but you get the picture – “We are better, no doubt“.

Are the data patterns and behaviours of AI different? How do they affect the storage design as the data moves through the workflow, the data paths and the lifecycle of the AI ecosystem?

Continue reading

The Heart of Digital Transformation is …

Businesses have taken up Digital Transformation in different ways and at different pace. In Malaysia, company boardrooms are accepting Digital Transformation as a core strategic initiative, crucial to develop competitive advantage in their respective industries. Time and time again, we are reminded that Data is the lifeblood and Data fuels the Digital Transformation initiatives.

The rise of CDOs

In line with the rise of the Digital Transformation buzzword, I have seen several unique job titles coming up since a few years ago. Among those titles, “Chief Digital Officer“, “Chief Data Officer“, “Chief Experience Officer” are some eye-catching ones. I have met a few of them, and so far, those I met were outward facing, customer facing. In most of my conversations with them respectively, they projected a front that their organization, their business and operations have been digital transformed. They are ready to help their customers to transform. Are they?

Tech vendors add more fuel

The technology vendors have an agenda to sell their solutions and their services. They paint aesthetically pleasing stories of how their solutions and wares can digitally transform any organizations, and customers latch on to these ‘shiny’ tech. End users get too fixated that technology is the core of Digital Transformation. They are wrong.

Missing the Forest

As I gather more insights through observations, and more conversations and more experiences, I think most of the “digital transformation ready” organizations are not adopting the right approach to Digital Transformation.

Digital Transformation is not tactical. It is not a one-time, big bang action that shifts from not-digitally-transformed to digitally-transformed in a moment. It is not a sprint. It is a marathon. It is a journey that will take time to mature. IDC and its Digital Transformation MaturityScape Framework is spot-on when they first released the framework years ago.

IDC Digital Transformation Maturityscape

Continue reading

Whither HPC, HPE?

HPE is acquiring Cray Inc. Almost 3 years ago, HPE acquired SGI. Back in 2017, HPE partnered WekaIO, and invested big in the latest Series C funding of WekaIO just weeks ago.

Cray, SGI and WekaIO are all strong HPC technology companies. Given the strong uptick in the HPC market, especially commercial HPC, we cannot deny HPE’s ambition to become the top SuperComputing and HPC vendor in the industry. Continue reading

Scaling new HPC with Composable Architecture

[Disclosure: I was invited by Dell Technologies as a delegate to their Dell Technologies World 2019 Conference from Apr 29-May 1, 2019 in the Las Vegas USA. Tech Field Day Extra was an included activity as part of the Dell Technologies World. My expenses, travel, accommodation and conference fees were covered by Dell Technologies, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

Deep Learning, Neural Networks, Machine Learning and subsequently Artificial Intelligence (AI) are the new generation of applications and workloads to the commercial HPC systems. Different from the traditional, more scientific and engineering HPC workloads, I have written about the new dawn of supercomputing and the attractive posture of commercial HPC.

Don’t be idle

From the business perspective, the investment of HPC systems is high most of the time, and justifying it to the executives and the investors is not easy. Therefore, it is critical to keep feeding the HPC systems and significantly minimize the idle times for compute, GPUs, network and storage.

However, almost all HPC systems today are inflexible. Once assigned to a project, the resources pretty much stay with the project, even when the workload processing of the project is idle and waiting. Of course, we have to bear in mind that not all resources are fully abstracted, virtualized and software-defined whereby you can carve out pieces of the hardware and deliver a percentage of that resource. Case in point is the CPU, where you cannot assign certain clock cycles of CPU to one project and another half to the other. The technology isn’t there yet. Certain resources like GPU is going down the path of Virtual GPU, and into the realm of resource disaggregation. Eventually, all resources of the HPC systems – CPU, memory, FPGA, GPU, PCIe channels, NVMe paths, IOPS, bandwidth, burst buffers etc – should be disaggregated and pooled for disparate applications and workloads based on demands of usage, time and performance.

Hence we are beginning to see the disaggregated HPC systems resources composed and built up the meet the diverse mix and needs of HPC applications and workloads. This is even more acute when a AI project might grow cold, but the training of AL/ML/DL workloads continues to stay hot

Liqid the early leader in Composable Architecture

Continue reading

Connecting ideas and people with Dell Influencers

[Disclosure: I was invited by Dell Technologies as a delegate to their Dell Technologies World 2019 Conference from Apr 29-May 1, 2019 in the Las Vegas USA. My expenses, travel, accommodation and conference fees were covered by Dell Technologies, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

I just got home from Vegas yesterday after attending my 2nd Dell Technologies World as one of the Dell Luminaries. The conference was definitely a bigger one than the one last year, with more than 15,000 attendees. And there was a frenzy of announcements, from Dell Technologies Cloud to new infrastructure solutions, and more. The big one for me, obviously was Azure VMware Solutions officiated by Microsoft CEO Satya Nadella and VMware CEO Pat Gelsinger, with Michael Dell bringing together the union. I blogged about Dell jumping into the cloud in a big way.

AI Tweetup

In the razzmatazz, the most memorable moments were one of the Tweetups organized by Dr. Konstanze Alex (Konnie) and her team, and Tech Field Day Extra.

Tweetup was alien to me. I didn’t know how the concept work and I did google tweetup before that. There were a few tweetups on the topics of data protection and 5G, but the one that stood out for me was the AI tweetup.

No alt text provided for this image

Continue reading

Dell go big with Cloud

[Disclaimer: I have been invited by Dell Technologies as a delegate to their Dell Technologies World 2019 Conference from Apr 29-May 1, 2019 in the Las Vegas USA. My expenses, travel and accommodation are covered by Dell Technologies, the organizer and I am not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

Talk about big. Dell Technologies just went big with the Cloud.

The Microsoft Factor

Day 1 of Dell Technologies World 2019 (DTW19) started with a big surprise to many, including yours truly when Michael Dell, together with Pat Gelsinger invited Microsoft CEO, Satya Nadella on stage.

There was nothing new about Microsoft working with Dell Technologies. Both have been great partners since the PC days, but when they announced Azure VMware Solutions to the 15,000+ attendees of the conference, there was a second of disbelief, followed by an ovation of euphoria.

VMware solutions will run native on Microsoft Azure Cloud. The spread of vSphere, VSAN, vCenter, NSX-T and VMware tools and environment will run on Azure Bare Metal Infrastructure at multiple Azure locations. How big is that. Continue reading

Lift and Shift Begone!

I am excited. New technologies are bringing the data (and storage) closer to processing and compute than ever before. I believe the “Lift and Shift” way would be a thing of the past … soon.

Data is heavy

Moving data across the network is painful. Moving data across distributed networks is even more painful. To compile the recent first image of a black hole, an amount of 5PB or more had to shipped for central processing. If this was moved over a 10 Gigabit network, it would have taken weeks.

Furthermore, data has dependencies. Snapshots, clones, and other data relationships with applications and processes render data inert, weighing it down like an anchor of a ship.

When I first started in the industry more than 25 years ago, Direct Attached Storage (DAS) was the dominating storage platform. I had a bulky Sun MultiDisk Pack connected via Fast SCSI to my SPARCstation 2 (diagram below):

Then I was assigned as the implementation engineer for Hock Hua Bank (now defunct) retail banking project in their Sibu HQ in East Malaysia. It was the first Sun SPARCstorage 1000 (photo below), running a direct attached Fibre Channel 0.25 Gbps FCAL (Fibre Channel Arbitrated Loop). It was the cusp of the birth of SAN (Storage Area Network).

Photo from https://www.cca.org/dave/tech/sys5/

The proliferation of SAN over the next 2 decades pushed DAS into obscurity, until SAS (Serial Attached SCSI) came about. Added to the mix was the prominence of Cloud Storage. But on-premises storage and Cloud Storage didn’t always come together. There was always a valley between the 2, until the public clouds gained a stronger foothold in the minds of IT and businesses. Today, both on-premises storage and cloud storage are slowly cosying as one Data Singularity, thanks to vision and conceptualization of data fabrics. NetApp was an early proponent of the Data Fabric concept 4 years ago. Continue reading

WekaIO controls their performance destiny

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

I was first introduced to WekaIO back in Storage Field Day 15. I did not blog about them back then, but I have followed their progress quite attentively throughout 2018. 2 Storage Field Days and a year later, they were back for Storage Field Day 18 with a new CTO, Andy Watson, and several performance benchmark records.

Blowout year

2018 was a blowout year for WekaIO. They have experienced over 400% growth, placed #1 in the Virtual Institute IO-500 10-node performance challenge, and also became #1 in the SPEC SFS 2014 performance and latency benchmark. (Note: This record was broken by NetApp a few days later but at a higher cost per client)

The Virtual Institute for I/O IO-500 10-node performance challenge was particularly interesting, because it pitted WekaIO against Oak Ridge National Lab (ORNL) Summit supercomputer, and WekaIO won. Details of the challenge were listed in Blocks and Files and WekaIO Matrix Filesystem became the fastest parallel file system in the world to date.

Control, control and control

I studied WekaIO’s architecture prior to this Field Day. And I spent quite a bit of time digesting and understanding their data paths, I/O paths and control paths, in particular, the diagram below:

Starting from the top right corner of the diagram, applications on the Linux client (running Weka Client software) and it presents to the Linux client as a POSIX-compliant file system. Through the network, the Linux client interacts with the WekaIO kernel-based VFS (virtual file system) driver which coordinates the Front End (grey box in upper right corner) to the Linux client. Other client-based protocols such as NFS, SMB, S3 and HDFS are also supported. The Front End then interacts with the NIC (which can be 10/100G Ethernet, Infiniband, and NVMeoF) through SR-IOV (single root IO virtualization), bypassing the Linux kernel for maximum throughput. This is with WekaIO’s own networking stack in user space. Continue reading

StorPool – Block storage managed well

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

Storage technology is complex. Storage infrastructure and data management operations are not trivial, despite what the hyperscalers like Amazon Web Services and Microsoft Azure would like you to think. As the adoption of cloud infrastructure services grow, the small and medium businesses/enterprises (SMB/SME) are usually left to their own devices to manage the virtual storage infrastructure. Cloud Service Providers (CSPs) addressing the SMB/SME market are looking for easier, worry-free, software-defined storage to elevate their value to their customers.

Managed high performance block storage

Enter StorPool.

StorPool is a scale-out block storage technology, capable of delivering 1 million+ IOPS with sub-milliseconds response times. As described by fellow delegate, Ray Lucchesi in his recent blog, they were able to achieve these impressive performance numbers in their demo, without the high throughput RDMA network or the storage class memory of Intel Optane. Continue reading

VAST Data must be something special

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

Vast Data coming out bash!

The delegates of Storage Field Days were always the lucky bunch. We have witnessed several storage technology companies coming out of stealth at these Tech Field Days. The recent ones in memory for me were Excelero and Hammerspace. But to have one where the venerable storage doyen, Mr. Howard Marks, Vast Data new tech evangelist, to introduce the deep dive of Vast Data technology was something special.

For those who knew Howard, he is fiercely independent, very storage technology smart, opinionated and not easily impressed. As a storage technology connoisseur myself, I believe Howard must have seen something special in Vast Data. They must be doing something extremely unique and impressive that someone like Howard could not resist, and made him jump to the vendor side. This sets the tone of my blog.

Continue reading