Rethinking Storage OKRs for AI Data Infrastructure – Part 2

[ Preamble: This analysis focuses on my own journey as I incorporate my experiences into this new market segment called AI Data Infrastructure. There are many elements of HPC (High Performance Computing) at play here. Even though things such as speeds and feeds, features and functions crowd many conversations, as many enterprise storage vendors do, these conversations, in my opinion, are secondary. There are more vital and important operational technology and technical elements that an organization has to consider prudently. They involve asking the hard questions beyond the marketing hype and fluff. I call these elements of consideration Storage Objectives and Key Results (OKRs) for AI Data Infrastructure.

I had to break this blog into 2 parts. It has become TL;DR-ish. This is Part 2 ]

This is a continuation from Part 1 of my blog last week. I spoke about the 4 key OKRs (Objectives and Key Results) we look at from the storage point-of-view with regards to AI data infrastructure. To recap, they are:

  • Reliability
  • Speed
  • Power Efficiency
  • Security

Power Efficiency

Patrick Kennedy of ServeTheHome (STH) fame, astutely explained the new generation of data center racks required by NVIDIA® and AMD® in his article “Here is how much Power we expect AMD® and NVIDIA® racks will need in 2027” 2 weeks ago. Today, the NVIDIA® GB200 NVL72 ORv3 rack design takes up 120kW per rack. That’s an insane amount of power consumption that can only go up in the next 2-3 years. That is why power efficiency must be an OKR metric to be deeply evaluated.

When you operate a GPU compute farm, whether it is 8 GPUs or 16,384 GPUs, keep operations tight is vital to ensure that maximum power efficiency is right up there with the rest of the operational OKRs. The element of power consumption becomes a cost factor in the data infrastructure design for AI.

2 very important units of measurements I would look into, and that have become valuable OKRs to achieve are Performance per Watt (Performance/Watt) and Performance per Rack Unit (Performance/RU).

Power Efficiency in Data Center is a Must.

Continue reading

I built a 6-node Gluster cluster with TrueNAS SCALE

I haven’t had hands-on with Gluster for over a decade. My last blog about Gluster was in 2011, right after I did a proof-of-concept for the now defunct, Jaring, Malaysia’s first ISP (Internet Service Provider). But I followed Gluster’s development on and off, until I found out that Gluster was a feature in then upcoming TrueNAS® SCALE. That was almost 2 years ago, just before I accepted to offer to join iXsystems™, my present employer.

The eagerness to test drive Gluster (again) on TrueNAS® SCALE has always been there but I waited for SCALE to become GA. GA finally came on February 22, 2022. My plans for the test rig was laid out, and in the past few weeks, I have been diligently re-learning and putting up the scope to built a 6-node Gluster clustered storage with TrueNAS® SCALE VMs on Virtualbox®.

Gluster on OpenZFS with TrueNAS SCALE

Before we continue, I must warn that this is not pretty. I have limited computing resources in my homelab, but Gluster worked beautifully once I ironed out the inefficiencies. Secondly, this is not a performance test as well, for obvious reasons. So, this is the annals along with the trials and tribulations of my 6-node Gluster cluster test rig on TrueNAS® SCALE.

Continue reading

Sassy Cato

I am not cybersecurity guy at all. Cybersecurity, to me, is a hodgepodge of many things. It is complex and it is confusing. But to every organization that has to deal with cloud SaaS (software-as-a-service) applications, mobile devices, work from home, and the proliferation of network connections from everywhere to the edge and back, strong cybersecurity without the burden of sluggish performance and without the complexity of stitching the cybersecurity point solutions would be a god send.

About 3 1/2 years ago, when I was an independent consultant, I was asked by a friend to help him (I was also looking for a gig) sell a product. It was Aryaka Networks, an SD-WAN solution. It was new to me, although I had some MPLS (multi protocol label switching) knowledge from some point in my career. But the experience with Aryaka at the people level was not too encouraging, with several people I was dealing with, switching positions or leaving Aryaka, including their CEO at the time, John Peters. After about 4 months or so, my friend lost confidence and decided to switch to Cato Networks.

Cato Networks opened up my eyes to what I believe cybersecurity should be. Simple, performant, and with many of the previous point requirements like firewall, VPN, zero trust networks, identity management, intrusion prevention, application gateways, threat detection and response, remote access, WAN acceleration and several more, all beautifully crafted into a single cloud-based service. There was an enlightenment moment for a greenhorn like me as I learned more about the Cato solution. That singularity of distributed global networking and cybersecurity blew me away.

Continue reading

What the heck is Storage Modernization?

We often hear the word “modernization” thrown around these days. The push is to get the end user to refresh their infrastructure, and the storage infrastructure market is rife with modernization word. Is your storage ripe for “modernization“?

Many possibilities to modernize storage

To modernize, it has to be relative to legacy storage hardware, and the operating environment that came with it. But if the so-called “legacy” still does the job, should you modernize?

Big Data is right

When the word “Big Data” came into prominence a while back, it stirred the IT industry into a frenzy. At one point, Apache Hadoop became the poster elephant (pun intended) for this exciting new segment. So many Vs came out, but I settled with 4 Vs as the framework of my IT conversations. The 4Vs we often hear are:

  • Volume
  • Velocity
  • Variety
  • Veracity

Continue reading

Is Software Defined right for Storage?

George Herbert Leigh Mallory, mountaineer extraordinaire, was once asked “Why did you want to climb Mount Everest?“, in which he replied “Because it’s there“. That retort demonstrated the indomitable human spirit and probably exemplified best the relationship between the human being’s desire to conquer the physical limits of nature. The software of humanity versus the hardware of the planet Earth.

Juxtaposing, similarities can be said between software and hardware in computer systems, in storage technology per se. In it, there are a few schools of thoughts when it comes to delivering storage services with the notable ones being the storage appliance model and the software-defined storage model.

There are arguments, of course. Some are genuinely partisan but many a times, these arguments come in the form of the flavour of the moment. I have experienced in my past companies touting the storage appliance model very strongly in the beginning, and only to be switching to a “software company” chorus years after that. That was what I meant about the “flavour of the moment”.

Software Defined Storage

Continue reading

The prudence needed for storage technology companies

Blitzscaling has been on my mind a lot. Ever since I discovered that word a while back, it has returned time and time again to fill my thoughts. In the wake of COVID-19, and in the mire of this devastating pandemic, is blitzscaling still the right strategy for this generation of storage technology, hyperconverged, data management and cloud storage startups?

What the heck is Blitzscaling? 

For the uninformed, here’s a video of Reid Hoffman, co-founder of Linked and a member of the Paypal mafia, explaining Blitzscaling.

Blitzscaling is about hyper growing, scaling ultra fast and rocketing to escape velocity, at the expense of things like management efficiency, financial prudence, profits and others. While this blog focuses on storage companies, blitzscaling is probably most recognizable in the massive expansion of Uber (and contraction) a few years ago. In the US, the ride hailing war is between Uber and Lyft, but over here in South East Asia, just a few years back, it was between Uber and Grab. In China it was Uber and Didi.

From the storage angle, 2 segments exemplified the blitzscaling culture between 2015 and 2020.

  • All Flash Startups
  • Hyper Converged Infrastructure Startups

Continue reading

FUDs for Real

We human beings hate losing. It puts a psychological anxiety of not having to own and we are missing out on something. That “some thing” could be adulation, attention, rewards and many other things that seem to enrich our superficial lives. FOMO (fear of missing out) is real, ladies and gentlemen.

When we see a sign like the one below, what do you think is going through our mind?

Limited Time Offer

Limited Time Offer sign

OMG! I got to get it because the deal is for a limited time only! It will never happen again in my lifetime and I gotta buy it!

The game of FUD

F.U.D. (Fear, Uncertainty and Doubt). We in the technology business have seen tons of tactics to entice someone to buy our products or switch allegiance to the our side. And we do it ourselves too, consciously and unconsciously, intentionally or unintentionally. There is no denying to that.

Based on what is known, and what is unknown to us, we share information which are available to us, in ways we want to influence and effect. But let it be known that we do not have a world view of everything, and thus, we chose to believe what we see and hear and experience. As human beings, we cannot 100% subscribe to be fearless, 100% certain that we are right, and 100% without a doubt do things or acts upon things that will be 100% correct. The outcome of FUD to create a convoluted messed up thought process that will deliver the desired effect and action. It is the universal Law of Cause and Effect.

The effect the marketers want you to think will speed up or delay your decision making, throw a spanner in to the thought process, and illogically gives you meaningless and meaningful (to your desires) heebee jeebees. The feeling of loss or missing out creates “displaced anxiety“, a Freudian concept where the projected fear and emotions will land into something that felt safer, even if the safer “target” may be irrelevant. And it is this irrelevant decision that marketers want to you take, because what they are selling is the “safer” decision.

Continue reading

Down the rabbit hole with Kubernetes Storage

Kubernetes is on fire. Last week VMware® released the State of Kubernetes 2020 report which surveyed companies with 1,000 employees and above. Results were not surprising as the adoptions of this nascent technology are booming. But persistent storage remained the nagging concern for the Kubernetes serving the infrastructure resources to applications instances running in the containers of a pod in a cluster.

The standardization of storage resources have settled with CSI (Container Storage Interface). Storage vendors have almost, kind of, sort of agreed that the API objects such as PersistentVolumes, PersistentVolumeClaims, StorageClasses, along with the parameters would be the way to request the storage resources from the Pre-provisioned Volumes via the CSI driver plug-in. There are already more than 50 vendor specific CSI drivers in Github.

Kubernetes and CSI initiative

Kubernetes and the CSI (Container Storage Interface) logos

The CSI plug-in method is the only way for Kubernetes to scale and keep its dynamic, loadable storage resource integration with external 3rd party vendors, all clamouring to grab a piece of this burgeoning demands both in the cloud and in the enterprise.

Continue reading