Mellanox Technologies – Storage Gaga

The AI Platformization of Storage – The Data Intelligence Platform

By cfheoh | March 3, 2025 - 7:30 am |March 3, 2025 Algorithm, Analytics, API, Artificial Intelligence, Big Data, Cloud, Clusters, Containers, Data Direct Networks, Data Fabric, Data Governance, Data Management, Data Security, DDN, Deep Learning, Digital Transformation, eDiscovery, Filesystems, High Performance Computing, Infiniband, Kubernetes, Linux, Lustre, Machine Learning, Mellanox Technologies, nVidia, Object Storage, Openstack, RDMA, Scale-out architecture, Software Defined Storage, Software-defined Datacenter, Storage Optimization

Leave a comment

The IT industry uses the word “platform” all the time. Often, I find myself shifting between the many jargons circling “platform”, loosely. I am pretty sure many others are doing so as well.

I finally found the word “platformization” giving right vibes in a meaningful way in February last year, when Palo Alto Networks pivoted to platformization. Their stock tumbled that day. Despite the ambiguous definition “platformization” when Palo Alto Networks (PANW) mentioned it, I understood their strategy.

Defence-in-Depth in cybersecurity wasn’t exactly working for many organizations. Cybersecurity point solutions peppered the landscape. There were so many leaks and gaps. Platformization, from the PANW‘s point-of-view, is the reverse C&C (command & control), if you know the cybersecurity speak. PANW wants to take charge all the way for all things cybersecurity, and it made sense to me from a data perspective.

Paradigm shift for Data.

For the longest time, networked storage technology has been about data sharing, be it blocks, files or objects. The data from these protocols is delivered over the network, mostly over Fibre Channel and/or Ethernet (although I remembered implementing NFS over Asynchronous Transfer Mode at Sarawak Shell in East Malaysia), in a client-server fashion.

By late 2000s onwards, unified storage or multi-protocol storage (where the storage array is able to served all 3 SAN, NAS and S3 services) was all the rage. All the prominent enterprise storage vendors had a solution or two in their solutions portfolio. I started viewing networked storage as a Data Services Platform which I started explaining it in 2017. Within the data services platform, various features revolve around my A.P.P.A.R.M.S.C. framework (I crafted the initial framework in 2000, thanks to Jon Toigo‘s book – The Holy Grail of Data Management). This framework and the approach I used for my consulting and analyst work worked well and is still relevant, even after 25 years.

But AI is changing the data landscape. AI is changing the way data is consumed and processed through the networks between the compute layer and the storage layer. It is indeed, for me, a paradigm shift of data, and the storage layer, better known as AI Data Infrastructure now, is shifting as well. And this shift will accelerate the exponential growth in innovations, with AI and super-charged data leading the way.

DDN Infinia Data Intelligence Platform (screencapture from DDN Beyond Artificial webinar)

Continue reading →

Rethinking Storage OKRs for AI Data Infrastructure – Part 1

By cfheoh | January 6, 2025 - 8:00 am |January 13, 2025 Algorithm, Analytics, API, Appliance, Artificial Intelligence, Big Data, Cloud, Clusters, Composable Infrastructure, Containers, Data, Data Direct Networks, Data Management, DDN, Deep Learning, Dell, Digital Transformation, Filesystems, HDS, High Performance Computing, HPE, Infiniband, Kubernetes, Linux, Machine Learning, Mellanox Technologies, NetApp, nVidia, NVMe, Object Storage, Parallel NFS, Performance Benchmark, Pure Storage, Reliability, Scale-out architecture, Software Defined Storage, Storage Optimization, Vast Data, WekaIO

Leave a comment

[ Preamble: This analysis focuses on my own journey as I incorporate my past experiences into this new market segment called AI Data Infrastructure, and gaining new ones.

There are many elements of HPC (High Performance Computing) at play here. Even though things such as speeds and feeds, features and functions crowd many conversations, as many enterprise storage vendors like to do, these conversations, in my opinion, are secondary. There are more vital and important operational technology and technical elements that an organization has to consider prudently, vis-a-vis to ROIs (returns of investments). They involve asking the hard questions beyond the marketing hype and fluff. I call these elements of consideration Storage Objectives and Key Results (OKRs) for AI Data Infrastructure.

I had to break this blog into 2 parts. It has become TL;DR-ish. This is Part 1 ]

I have just passed my 6-month anniversary with DDN. Coming into the High Performance Storage System (HPSS) market segment, with the strong focus on the distributed parallel filesystem of Lustre®, there was a high learning curve for me. I spend over 3 decades in Enterprise Storage, with some of the highest level of storage technologies there were in that market segment. And I have already developed my own approach to enterprise storage, based on the A.P.P.A.R.M.S.C.. That was already developed and honed from 25 years ago.

The rapid adoption of AI has created a technology paradigm shift. Artificial Intelligence (AI) came in and blurred many lines. It also has been evolving my thinking when it comes to storage for AI. There is also a paradigm shift in my thoughts, opinions and experiences as well.

AI has brought HPSS technologies like Lustre® in DDN EXAscaler platform , proven in the Supercomputing world, to a new realm – the AI Data Infrastructure market segment. On the other side, many enterprise storage vendors aspire to be a supplier to the AI Data Infrastructure opportunities as well. This convergence from the top storage performers for Supercomputing, in the likes of DDN, IBM® (through Storage Scale), HPE® (through Cray, which by-the-way often uses the open-source Lustre® edition in its storage portfolio), from the software-defined storage players in Weka IO, Vast Data, MinIO, and from the enterprise storage array vendors such as NetApp®, Pure Storage®, and Dell®.

[ Note that I take care not to name every storage vendor for AI because many either do OEMs or repacking and rebranding of SDS technology into their gear such as HPE® GreenLake for Files and Hitachi® IQ. You can Google to find out who the original vendors are for each respectively. There are others as well. ]

In these 3 simplified categories (HPSS, SDS, Enterprise Storage Array), I have begun to see a pattern of each calling its technology as an “AI Data Infrastructure”. At the same time, I am also developing a new set of storage conversations for the AI Data Infrastructure market segment, one that is based on OKRs (Objectives and Key Results) rather than just features, features and more features that many SDS and enterprise storage vendors like to tout. Here are a few thoughts that we should look for when end users are considering a high-speed storage solution for their AI journey.

AI Data Infrastructure

GPU is king

In the AI world, the GPU infrastructure is the deity at the altar. The utilization rate of the GPUs is kept at the highest to get the maximum compute infrastructure return-on-investment (ROI). Keeping the GPUs resolutely busy is a must. HPSS is very much part of that ecosystem.

These are a few OKRs I would consider the storage or data infrastructure for AI.

Reliability
Speed
Power Efficiency
Security

Let’s look at each one of them from the point of view of a storage practitioner like me.

Continue reading →

AI and the Data Factory

By cfheoh | November 19, 2024 - 6:13 am |November 19, 2024 Algorithm, Analytics, API, Appliance, Artificial Intelligence, Cloud, Clusters, Composable Infrastructure, Data, Data Direct Networks, Data Governance, Data Management, Data Privacy, Data Protection, Data Security, DDN, Deep Learning, Digital Transformation, Filesystems, Hadoop Clusters, High Performance Computing, Lustre, Machine Learning, Mellanox, Mellanox Technologies, Minio, nVidia, Object Storage, Parallel NFS, Performance Benchmark, Performance Caching, RDMA, Scale-out architecture, Storage Optimization

Leave a comment

When I first heard of the word “AI Factory”, the world was blaring Jensen Huang‘s keynote at NVIDIA GTC24. I thought those were cool words, since he mentioned about the raw material of water going into the factory to produce electricity. The analogy was spot on for the AI we are building.

As I engage with many DDN partners and end users in the region, week in, week out, the “AI Factory” word keeps popping into conversations. Yet, many still do not know how to go about building this “AI Factory”. They only know they need to buy GPUs, lots of them. These companies’ AI ambitions are unabated. And IDC predicts that worldwide spending on AI will double by 2028, and yet, the ROI (returns on investment) remains elusive.

At the ground level, based on many conversations so far, the common theme is, the steps to begin building the AI Factory are ambiguous and fuzzy to most. I like to share my views from a data storage point of view. Hence, my take on the Data Factory for AI.

Are you AI-ready?

We have to have a plan but before we take the first step, we must look at where we are standing at the present moment. We know that to train AI, the proverbial step is, we need lots of data. Deep Learning (DL) works with Large Language Models (LLMs), and Generative AI (GenAI), needs tons of data.

If the company knows where they are, they will know which phase is next. So, in the AI Maturity Model (I simplified the diagram below), where is your company now? Are you AI-ready?

Simplified AI Maturity Model

Get the Data Strategy Right

In his interview with CRN, MinIO’s CEO AB Periasamy quoted “For generative AI, they realized that buying more GPUs without a coherent data strategy meant GPUs are going to idle out”. I was struck by his wisdom about having a coherent data strategy because that is absolutely true. This is my starting point. Having the Right Data Strategy.

In the AI world, from a data storage guy, data is the fuel. Data is the raw material that Jensen alluded to, if it was obvious. We have heard this anecdotal quote many times before, even before the AI phenomenon took over. AI is data-driven. Data is vital for the ROI of AI projects. And thus, we must look from the point of the data to make the AI Factory successful.

Continue reading →

Accelerated Data Paths of High Performance Storage is the Cornerstone of building AI

By cfheoh | September 16, 2024 - 7:30 am |September 16, 2024 100Gigabit Ethernet, Algorithm, Analytics, API, Appliance, Artificial Intelligence, Big Data, Cloud, Clusters, Containers, DDN, Deep Learning, Fibre Channel, Filesystems, Flash, High Performance Computing, Infiniband, Lustre, Machine Learning, Mellanox, Mellanox Technologies, Nutanix, NVMe, Parallel NFS, Performance Benchmark, Performance Caching, pNFS, RDMA, Scale-out architecture, Storage Optimization

Leave a comment

It has been 2 months into my new role at DDN as a Solutions Architect. With many revolving doors around me, I have been trying to find the essence, the critical cog of the data infrastructure that supports the accelerated computing of the Nvidia GPU clusters. The more I read and engage, a pattern emerged. I found that cog in the supercharged data paths between the storage infrastructure systems and the GPU clusters. I will share more.

To set the context, let me start with a wonderful article I read in CIO.com back in July 2024. It was titled “Storage: The unsung hero of AI deployments“. It was music to my ears because as a long-time practitioner in the storage technology industry, it is time the storage industry gets its credit it deserves.

What is the data path?

To put it simply, a Data Path, from a storage context, is the communication route taken by the data bits between the compute system’s processing and program memory and the storage subsystem. The links and the established sessions can be within the system components such as the PCIe bus or external to the system through the shared networking infrastructure.

High speed accelerated data paths

In the world of accelerated computing such as AI and HPC, there are additional, more advanced technologies to create even faster delivery of the data bits. This is the accelerated data paths between the compute nodes and the storage subsystems. Following on, I share a few of these technologies that are lesser used in the enterprise storage segment.

Continue reading →

Storage IO straight to GPU

By cfheoh | July 5, 2021 - 9:00 am |July 3, 2021 100Gigabit Ethernet, Algorithm, Analytics, API, Artificial Intelligence, Composable Infrastructure, compression, CXL, Deduplication, Deep Learning, Filesystems, High Performance Computing, Hyperconvergence, Machine Learning, Mellanox, Mellanox Technologies, Microsoft, nVidia, NVMe, RDMA, Vast Data, WekaIO

Leave a comment

The parallel processing power of the GPU (Graphics Processing Unit) cannot be denied. One year ago, nVidia® overtook Intel® in market capitalization. And today, they have doubled their market cap lead over Intel®, [as of July 2, 2021] USD$510.53 billion vs USD$229.19 billion.

Thus it is not surprising that storage architectures are changing from the CPU-centric paradigm to take advantage of the burgeoning prowess of the GPU. And 2 announcements in the storage news in recent weeks have caught my attention – Windows 11 DirectStorage API and nVidia® Magnum IO GPUDirect® Storage.

nVidia GPU

Exciting the gamers

The Windows DirectStorage API feature is only available in Windows 11. It was announced as part of the Xbox® Velocity Architecture last year to take advantage of the high I/O capability of modern day NVMe SSDs. DirectStorage-enabled applications and games have several technologies such as D3D Direct3D decompression/compression algorithm designed for the GPU, and SFS Sampler Feedback Streaming that uses the previous rendered frame results to decide which higher resolution texture frames to be loaded into memory of the GPU and rendered for the real-time gaming experience.

Continue reading →

Is Software Defined right for Storage?

By cfheoh | April 19, 2021 - 9:00 am |April 18, 2021 Acquisition, API, Appliance, ATA over Ethernet, Ceph, Composable Infrastructure, Coraid, Cray Inc, Data Direct Networks, Deduplication, deduplication, Filesystems, FreeNAS, Gluster, HDS, High Performance Computing, HPE Simplivity, Hyperconvergence, Infiniband, Intel, iXsystems, Liqid, Mellanox Technologies, Minio, NetApp, Nexenta, Nutanix, nVidia, OpenIO, Openstack, Oracle, PCIe, Performance Benchmark, Performance Caching, Pure Storage, RAID, RDMA, Redhat, Simplivity, SNIA, SoftIron, Software Defined Storage, Solaris, Solid State Devices, Storage Optimization, TrueNAS, Vast Data, Virtualization

Leave a comment

George Herbert Leigh Mallory, mountaineer extraordinaire, was once asked “Why did you want to climb Mount Everest?“, in which he replied “Because it’s there“. That retort demonstrated the indomitable human spirit and probably exemplified best the relationship between the human being’s desire to conquer the physical limits of nature. The software of humanity versus the hardware of the planet Earth.

Juxtaposing, similarities can be said between software and hardware in computer systems, in storage technology per se. In it, there are a few schools of thoughts when it comes to delivering storage services with the notable ones being the storage appliance model and the software-defined storage model.

There are arguments, of course. Some are genuinely partisan but many a times, these arguments come in the form of the flavour of the moment. I have experienced in my past companies touting the storage appliance model very strongly in the beginning, and only to be switching to a “software company” chorus years after that. That was what I meant about the “flavour of the moment”.

Software Defined Storage

Continue reading →

The Edge is coming! The Edge is coming!

By cfheoh | October 12, 2020 - 9:15 am |October 11, 2020 100Gigabit Ethernet, Analytics, Big Data, Containers, Data, Deep Learning, Edge Computing, Flash, Industry 4.0, InfluxDB, Linux, Machine Learning, Mellanox, Mellanox Technologies, Minio, nVidia, NVMe, Pravega, SNIA, Solid State Devices

Leave a comment

Actually, Edge Computing is already here. It has been here on everyone’s lips for quite some time, but for me and for many others, Edge Computing is still a hodgepodge of many things. The proliferation of devices, IoT, sensor, end points being pulled into the ubiquitous term of Edge Computing has made the scope ever changing, and difficult to pin down. And it is this proliferation of edge devices that will generate voluminous amount of data. Obvious questions emerge:

How to do you store all the data?
How do you process all the data?
How do you derive competitive value from the data from these edge devices?
How do you securely transfer and share the data?

From the storage technology perspective, it might be easier to observe what are the traits of the data generated on the edge device. In this blog, we also observe what could some new storage technologies out there that could be part of the Edge Computing present and future.

Edge Computing overview – Cloud to Edge to Endpoint

Storage at the Edge

The mantra of putting compute as close to the data and processing it where it is stored is the main crux right now, at least where storage of the data is concerned. The latency to the computing resources on the cloud and back to the edge devices will not be conducive, and in many older settings, these edge devices in factory may not be even network enabled. In my last encounter several years ago, there were more than 40 interfaces, specifications and protocols, most of them proprietary, for the edge devices. And there is no industry wide standard for these edge devices too.

Continue reading →

Dell EMC Isilon is an Emmy winner!

By cfheoh | March 16, 2020 - 7:41 am |March 17, 2020 100Gigabit Ethernet, Acquisition, Analytics, Appliance, CIFS, Cloud, Clusters, Containers, Data Availability, Deduplication, deduplication, Deep Learning, Dell, DellEMC, Disks, EMC, Flash, Gartner, High Performance Computing, Isilon, Mellanox, Mellanox Technologies, NAS, NetApp, NFS, Performance Caching, Pure Storage, Qumulo, Scale-out architecture, SMB, Snapshots, Software Defined Storage, Solid State Devices, Storage Field Day, Storage Market Share, Storage Optimization, Storage Tiering, Tech Field Day, WekaIO

2 Comments

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at this event. The content of this blog is of my own opinions and views ]

And the Emmy® goes to …

Yes, the Emmy® goes to Dell EMC Isilon! It was indeed a well deserved accolade and an honour!

Dell EMC Isilon had just won the Technology & Engineering Emmy® Awards a week before Storage Field Day 19, for their outstanding pioneering work on the NAS platform tiering technology of media and broadcasting content according to business value.

A lasting true clustered NAS

This is not a blog to praise Isilon but one that instill respect to a real true clustered, scale-out file system. I have known of OneFS for a long time, but never really took the opportunity to really put my hands on it since 2006 (there is a story). So here is a look at history …

Back in early to mid-2000, there was a lot of talks about large scale NAS. There were several players in the nascent scaling NAS market. NetApp was the filer king, with several competitors such as Polyserve, Ibrix, Spinnaker, Panasas and the young upstart Isilon. There were also Procom, BlueArc and NetApp’s predecessor Auspex. By the second half of the 2000 decade, the market consolidated and most of these NAS players were acquired.

NetApp acquired Spinnaker in 2003
Part of Auspex was acquired by NetApp in 2003; The other by Glasshouse Technologies
Procom was picked up by Sun Microsystems in 2005
Polyserve went to HP in 2007
Ibrix joined HP as well in 2009
Isilon got acquired by EMC in 2010
BlueArc gobbled up by HDS in 2011

Continue reading →

Is General Purpose Object Storage disenfranchised?

By cfheoh | December 23, 2019 - 5:40 pm |January 14, 2020 100Gigabit Ethernet, Amazon Web Services, Analytics, API, Artificial Intelligence, Big Data, BYOD, Ceph, Cloud, Cloudian, Clusters, Deep Learning, DellEMC, Docker, Dropbox, Edge Computing, Filesystems, Flash, Gartner, Hadoop, HDS, High Performance Computing, Hitachi Vantara, IDC, Industry 4.0, IoT, Lustre, Machine Learning, Mellanox Technologies, Minio, NetApp, Object Storage, OpenIO, Openstack, Performance Benchmark, Reliability, Scale-out architecture, Software Defined Storage, Storage Field Day, Storage Market Share, swiftstack, Tape storage, Tech Field Day

6 Comments

[Disclosure: I am invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees will be covered by GestaltIT, the organizer and I am not obligated to blog or promote the vendors’ technologies to be presented at this event. The content of this blog is of my own opinions and views]

This is NOT an advertisement for coloured balls.

This is the license to brag for the vendors in the next 2 weeks or so, as we approach the 2020 new year. This, of course, is the latest 2019 IDC Marketscape for Object-based Storage, released last week.

My object storage mentions

I have written extensively about Object Storage since 2011. With different angles and perspectives, here are some of them:

The Future is Intelligent Objects (2011)
What should be Cloud Storage? (2011)
APIs that stick in Storage (2012)
Has Object Storage become the Everything Store? (2013)
Of Object Storage, Filesystems and Multicloud (2017)
My Dilemma of Stateful Storage Marriage (2018)
The Malaysian Openstack Storage Conundrum (2018)
Sleepless in Malaysia with Object Storage (2019)
The Waning Light of Openstack Swift (2019)

Continue reading →

Storage Performance Considerations for AI Data Paths

By cfheoh | June 17, 2019 - 10:50 am |June 17, 2019 100Gigabit Ethernet, Algorithm, Analytics, API, Artificial Intelligence, Big Data, Cloud, Composable Infrastructure, Data, Data Fabric, Data Management, Data Privacy, Data Security, Digital Transformation, Drivescale, E8 Storage, Edge Computing, Elastifile, Excelero, Filesystems, High Performance Computing, Hyperconvergence, Industry 4.0, Infiniband, Intel, Liqid, Lustre, Machine Learning, Mellanox Technologies, NVMe, Object Storage, Performance Benchmark, Performance Caching, Quantum Corporation, RDMA, Software-defined Datacenter, Storage Optimization, Storage Tiering, ThinkParq, Vast Data, Virtualization, WekaIO

1 Comment

The hype of Deep Learning (DL), Machine Learning (ML) and Artificial Intelligence (AI) has reached an unprecedented frenzy. Every infrastructure vendor from servers, to networking, to storage has a word to say or play about DL/ML/AI. This prompted me to explore this hyped ecosystem from a storage perspective, notably from a storage performance requirement point-of-view.

One question on my mind

There are plenty of questions on my mind. One stood out and that is related to storage performance requirements.

Reading and learning from one storage technology vendor to another, the context of everyone’s play against their competitors seems to be “They are archaic, they are legacy. Our architecture is built from ground up, modern, NVMe-enabled“. And there are more juxtaposing, but you get the picture – “We are better, no doubt“.

Are the data patterns and behaviours of AI different? How do they affect the storage design as the data moves through the workflow, the data paths and the lifecycle of the AI ecosystem?

Continue reading →

Category Archives: Mellanox Technologies