Scaling new HPC with Composable Architecture

[Disclosure: I was invited by Dell Technologies as a delegate to their Dell Technologies World 2019 Conference from Apr 29-May 1, 2019 in the Las Vegas USA. Tech Field Day Extra was an included activity as part of the Dell Technologies World. My expenses, travel, accommodation and conference fees were covered by Dell Technologies, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

Deep Learning, Neural Networks, Machine Learning and subsequently Artificial Intelligence (AI) are the new generation of applications and workloads to the commercial HPC systems. Different from the traditional, more scientific and engineering HPC workloads, I have written about the new dawn of supercomputing and the attractive posture of commercial HPC.

Don’t be idle

From the business perspective, the investment of HPC systems is high most of the time, and justifying it to the executives and the investors is not easy. Therefore, it is critical to keep feeding the HPC systems and significantly minimize the idle times for compute, GPUs, network and storage.

However, almost all HPC systems today are inflexible. Once assigned to a project, the resources pretty much stay with the project, even when the workload processing of the project is idle and waiting. Of course, we have to bear in mind that not all resources are fully abstracted, virtualized and software-defined whereby you can carve out pieces of the hardware and deliver a percentage of that resource. Case in point is the CPU, where you cannot assign certain clock cycles of CPU to one project and another half to the other. The technology isn’t there yet. Certain resources like GPU is going down the path of Virtual GPU, and into the realm of resource disaggregation. Eventually, all resources of the HPC systems – CPU, memory, FPGA, GPU, PCIe channels, NVMe paths, IOPS, bandwidth, burst buffers etc – should be disaggregated and pooled for disparate applications and workloads based on demands of usage, time and performance.

Hence we are beginning to see the disaggregated HPC systems resources composed and built up the meet the diverse mix and needs of HPC applications and workloads. This is even more acute when a AI project might grow cold, but the training of AL/ML/DL workloads continues to stay hot

Liqid the early leader in Composable Architecture

Continue reading

Lift and Shift Begone!

I am excited. New technologies are bringing the data (and storage) closer to processing and compute than ever before. I believe the “Lift and Shift” way would be a thing of the past … soon.

Data is heavy

Moving data across the network is painful. Moving data across distributed networks is even more painful. To compile the recent first image of a black hole, an amount of 5PB or more had to shipped for central processing. If this was moved over a 10 Gigabit network, it would have taken weeks.

Furthermore, data has dependencies. Snapshots, clones, and other data relationships with applications and processes render data inert, weighing it down like an anchor of a ship.

When I first started in the industry more than 25 years ago, Direct Attached Storage (DAS) was the dominating storage platform. I had a bulky Sun MultiDisk Pack connected via Fast SCSI to my SPARCstation 2 (diagram below):

Then I was assigned as the implementation engineer for Hock Hua Bank (now defunct) retail banking project in their Sibu HQ in East Malaysia. It was the first Sun SPARCstorage 1000 (photo below), running a direct attached Fibre Channel 0.25 Gbps FCAL (Fibre Channel Arbitrated Loop). It was the cusp of the birth of SAN (Storage Area Network).

Photo from https://www.cca.org/dave/tech/sys5/

The proliferation of SAN over the next 2 decades pushed DAS into obscurity, until SAS (Serial Attached SCSI) came about. Added to the mix was the prominence of Cloud Storage. But on-premises storage and Cloud Storage didn’t always come together. There was always a valley between the 2, until the public clouds gained a stronger foothold in the minds of IT and businesses. Today, both on-premises storage and cloud storage are slowly cosying as one Data Singularity, thanks to vision and conceptualization of data fabrics. NetApp was an early proponent of the Data Fabric concept 4 years ago. Continue reading

Own the Data Pipeline

[Preamble: I was a delegate of Storage Field Day 15 from Mar 7-9, 2018. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]

I am a big proponent of Go-to-Market (GTM) solutions. Technology does not stand alone. It must be in an ecosystem, and in each industry, in each segment of each respective industry, every ecosystem is unique. And when we amalgamate data, the storage infrastructure technologies and the data management into the ecosystem, we reap the benefits in that ecosystem.

Data moves in the ecosystem, from system to system, north to south, east to west and vice versa, random, sequential, ad-hoc. Data acquires different statuses, different roles, different relevances in its lifecycle through the ecosystem. From it, we derive the flow, a workflow of data creating a data pipeline. The Data Pipeline concept has been around since the inception of data.

To illustrate my point, I created one for the Oil & Gas – Exploration & Production (EP) upstream some years ago.

 

Continue reading

NetApp and IBM gotta take risks

[Preamble: I was a delegate of Storage Field Day 15 from Mar 7-9, 2018. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]

Storage Field Day 15 was full of technology. There were a few avant garde companies in the line-up which I liked but unfortunately NetApp and IBM were the 2 companies that came in at the least interesting end of the spectrum.

IBM presented their SpectrumProtect Plus. The data protection space, especially backup isn’t exactly my forte when it comes to solution architecture but I know enough to get by. However, as IBM presented, there were some many questions racing through my mind. I was interrupting myself so much because almost everything presented wasn’t new to me. “Wait a minute … didn’t Company X already had this?” or “Company Y had this years ago” or “Isn’t this…??

I was questioning myself to validate my understanding of the backup tech shared by the IBM SpectrumProtect Plus team. And they presented with such passion and gusto which made me wonder if I was wrong in the first place. Maybe my experience and knowledge in the backup software space weren’t good enough. But then the chatter in the SFD15 Slack channel started pouring in. Comments, unfortunately were mostly negative, and jibes became jokes. One comment, in particular, nailed it. “This is Veeam 0.2“, and then someone else downgraded to version 0.1.

Continue reading