The All-Important Storage Appliance Mindset for HPC and AI projects

I am strong believer of using the right tool to do the job right. I have said this before 2 years ago, in my blog “Stating the case for a Storage Appliance approach“. It was written when I was previously working for an open source storage company. And I am an advocate of the crafter versus assembler mindset, especially in the enterprise and high- performance storage technology segments.

I have joined DDN. Even with DDN that same mindset does not change a bit. I have been saying all along that the storage appliance model should always be the mindset for the businesses’ peace-of-mind.

My view of the storage appliance model began almost 25 years. I came into NAS systems world via Sun Microsystems®. Sun was famous for running NFS servers on general Sun Solaris servers. NFS services on Unix systems. Back then, I remember arguing with one of the Sun distributors about the tenets of running NFS over 100Mbit/sec Ethernet on Sun servers. I was drinking Sun’s Kool-Aid big time.

When I joined Network Appliance® (now NetApp®) in 2000, my worldview of putting software on general purpose servers changed. Network Appliance®, had one product family, the FAS700 (720, 740, 760) family. All NetApp® did was to serve NFS services in the beginning. They were the NAS filers and nothing else.

I was completed sold on the appliance way with NetApp®. Firstly, it was my very first time knowing such network storage services could be provisioned with an appliance concept. This was different from Sun. I was used to managing NFS exports on a Sun SPARCstation 20 to Unix clients in the network.

Secondly, my mindset began to shape that “you have to have the right tool to the job correctly and extremely well“. Well, the toaster toasts bread very well and nothing else. And the fridge (an analogy used by Dave Hitz, I think) does what it does very well too. That is what the appliance does. You definitely cannot grill a steak with a bread toaster, just like you can’t run an excellent, ultra-high performance storage services to serve the demanding AI and HPC applications on a general server platform. You have to have a storage appliance solution for High-Speed Storage.

That little Network Appliance® toaster award given out to exemplary employees stood vividly in my mind. The NetApp® tagline back then was “Fast, Simple, Reliable”. That solidifies my mindset for the high-speed storage in AI and HPC projects in present times.

DDN AI400X2 Turbo Appliance

Costs Benefits and Risks

I like to think about what the end users are thinking about. There are investments costs involved, and along with it, risks to the investments as well as their benefits. Let’s just simplify and lump them into Cost-Benefits-Risk analysis triangle. These variables come into play in the decision making of AI and HPC projects.

Continue reading

Preliminary Data Taxonomy at ingestion. An opportunity for Computational Storage

Data governance has been on my mind a lot lately. With all the incessant talks and hype about Artificial Intelligence, the true value of AI comes from good data. Therefore, it is vital for any organization embarking on their AI journey to have good quality data. And the journey of the lifecycle of data in an organization starts at the point of ingestion, the data source of how data is either created, acquired to be presented up into the processing workflow and data pipelines for AI training and onwards to AI applications.

In biology, taxonomy is the scientific study and practice of naming, defining and classifying biological organisms based on shared characteristics.

And so, begins my argument of meshing these 3 topics together – data ingestion, data taxonomy and with Computational Storage. Here goes my storage punditry.

Data Taxonomy in post-injection 

I see that data, any data, has to arrive at a repository first before they are given meaning, context, specifications. These requirements are different from file permissions, ownerships, ctime and atime timestamps, the content of the ingested data stream are made to fit into the mould of the repository the data is written to. Metadata about the content of the data gives the data meaning, context and most importantly, value as it is used within the data lifecycle. However, the metadata tagging, and preparing the data in the ETL (extract load transform) or the ELT (extract load transform) process are only applied post-ingestion. This data preparation phase, in which data is enriched with content metadata, tagging, taxonomy and classification, is expensive, in term of resources, time and currency.

Elements of a modern event-driven architecture including data ingestion (Credit: Qlik)

Even in the burgeoning times of open table formats (Apache Iceberg, HUDI, Deltalake, et al), open big data file formats (Avro, Parquet) and open data formats (CSV, XML, JSON et.al), the format specifications with added context and meanings are added in and augmented post-injection.

Continue reading

Deploying a MinIO SNMD Object Storage Server in TrueNAS SCALE

[ Preamble ] This deployment of MinIO SNMD (single node multi drive) object storage server on TrueNAS® SCALE 24.04 (codename “Dragonfish”) is experimental. I am just deploying this in my home lab for the fun of it. Do not deploy in any production environment.

I have been contemplating this for quite a while. Which MinIO deployment mode on TrueNAS® SCALE should I work on? For one, there are 3 modes – Standalone, SNMD (Single Node Multi Drives) and MNMD (Multi Node Multi Drives). Of course, the ideal lab experiment is MNMD deployment, the MinIO cluster, and I am still experimenting this on my meagre lab resources.

In the end, I decided to implement SNMD since this is, most likely, deployed on top of a TrueNAS® SCALE storage appliance instead an x-86 bare-metal or in a Kubernetes cluster on Linux systems. Incidentally, the concept of MNMD on top of TrueNAS® SCALE is “Kubernetes cluster”-like albeit a different container platform. At the same time, if this is deployed in a TrueNAS® SCALE Enterprise, a dual-controller TrueNAS® storage appliance, it will take care of the “MinIO nodes” availability in its active-passive HA architecture of the appliance. Otherwise, it can be a full MinIO cluster spread and distributed across several TrueNAS storage appliances (minimum 4 nodes in a 2+2 erasure set) in an MNMD deployment scheme.

Ideally, the MNMD deployment should look like this:

MinIO distributed multi-node cluster architecture (credit: MinIO)

Continue reading

Making Immutability the key factor in a Resilient Data Protection strategy

We often hear “Cyber Resilience” word thrown around these days. Every backup vendor has a cybersecurity play nowadays. Many have morphed into cyber resilience warrior vendors, and there is a great amount of validation in terms of Cyber Resilience in a data protection world. Don’t believe me?

Check out this Tech Field Day podcast video from a month ago, where my friends, Tom Hollingsworth and Max Mortillaro discussed the topic meticulously with Krista Macomber, who has just become the Research Director for Cybersecurity at The Futurum Group (Congrats, Krista!).

Cyber Resilience, as well articulated in the video, is not old wine in a new bottle. The data protection landscape has changed significantly since the emergence of cyber threats and ransomware that it warrants the coining of the Cyber Resilience terminology.

But I want to talk about one very important cog in the data protection strategy, of which cyber resilience is part of. That is Immutability, because it is super important to always consider immutable backups as part of that strategy.

It is no longer 3-2-1 anymore, Toto. 

When it comes to backup, I always start with 3-2-1 backup rule. 3 copies of the data; 2 different media; 1 offsite. This rule has been ingrained in me since the day I entered the industry over 3 decades ago. It is still the most important opening line for a data protection specialist or a solution architect. 3-2-1 is the table stakes.

Yet, over the years, the cybersecurity threat landscape has moved closer and closer to the data protection, backup and recovery realm. This is now a merged super-segment pangea called cyber resilience. With it, the conversation from the 3-2-1 backup rule in these last few years is now evolving into something like 3-2-1-1-0 backup rule, a modern take of the 3-2-1 backup rule. Let’s take a look at the 3-2-1-1-0 rule (simplified by me).

The 3-2-1-1-0 Backup rule (Credit: https://www.dataprise.com/services/disaster-recovery/baas/)

Continue reading

Storage does not mean Capacity only

I was listening to several storage luminaries in the GestaltIT’s podcastNo one understands Storage anymore” a few of weeks ago. Around the minute of 11.09 in the podcast, Dr. J. Metz, SNIA® Chair, brought up this is powerful quote “Storage does not mean Capacity“. It struck me, not in a funny way. It is what it is, and it something I wanted to say to many who do not understand the storage solutions they are purchasing. It exemplifies what is wrong in the many organizations today in their understanding of investing in a storage infrastructure project.

This is my pet peeve. The first words uttered in most, if not all storage requirements in my line of work are, “I want this many Terabytes of storage“. There are no other details and context of what the other requirement factors are, such as availability, performance, future growth, etc. Or even the goals to achieve when purchasing a storage system and operating it. What is the improvement they are looking for? What are the problems to solve?

Where is the OKR?

It pains me to say this. For the folks who have in the IT industry for years, both end users and IT purveyors alike, most are absolutely clueless about OKR (Objectives and Key Results) for their storage infrastructure project. Many cannot frame the data challenges they are facing, and they have no idea where to go next. There is no alignment. There is no strategy. Even worse, there is no concept of how their storage infrastructure investments will improve their business and operations.

Just the other day, one company director from a renown IT integrator here in Malaysia came calling. He has been in the IT industry since 1989 (I checked his Linkedin profile), asking to for a 100TB storage quote. I asked a few questions about availability, performance, scalability; the usual questions a regular IT guy would ask. He has no idea, and instead of telling me he didn’t know, he gave me a runaround of this and that. Plenty of yada, yada nonsense.

In the end, I told him to buy a consumer grade storage appliance from Taiwan. I will just let him make a fool of himself in front of his customer since he didn’t want to take accountability of ensuring his customer get a proper enterprise storage solution in good faith. His customer is probably in the same mould as well.

Defensive Strategies as Data Foundations

A strong storage infrastructure foundation is vital for good Data Credibility. If you do the right things for your data, there is Data Value, and it will serve your business well. Both Data Credibility and Data Value create confidence. And Confidence equates Trust.

In order to create the defensive strategies let’s look at storage Availability, Protection, Accessibility, Management Security and Compliance. These are 6 of the 8 data points of the A.P.P.A.R.M.S.C. framework.

Offensive Strategies as Competitive Advantage

Once we have achieved stability of the storage infrastructure foundation, then we can turn over and drive towards storage Performance, Recovery, plus things like Scalability and Agility.

With a strong data infrastructure foundation, the organization can embark on the offensive, and begin their business transformation journey, knowing that their data is well run, protection, and performs.

Alignment with Data and Business Goals

Why are the defensive and offensive strategies requiring alignment to business goals?

The fact is simple. It is about improving the business and operations, and setting OKRs is key to measure the ROI (return of investment) of getting the storage systems and the solutions in place. It is about switching the cost-fearing (negative) mindset to a profit-conviction (positive) mindset.

For example, maybe the availability of the data to the business is poor. Maybe there is the need to have access to the data 24×7, because the business is going online. The simple measurable fact is we can move availability from 95% uptime to 99.99% uptime with an HA storage system.

Perhaps there are concerns about recoverability in the deluge of ransomware threats. Setting new RPO goals from 24 hours to 4 hours is a measurable objective to enhance data resiliency.

Or getting the storage systems to deliver higher performance from 350 IOPS to 5000 IOPS for the database.

What I am saying here is these data points are measurable, and they can serve as checkpoints for business and operational improvements. From a management perspective, these can be used as KPI (key performance index) to define continuous improvement of Data Confidence.

Furthermore, it is easy when a OKR dashboard is used to map the improvement markers when organizations use storage to move from point A to point B, where B equates to a new success milestone. The alignment sets the paths to the business targets.

Storage does not mean only Capacity

The sad part is what the OKRs and the measured goals alignments are glaringly missing in the minds of many organizations purchasing a storage infrastructure and data management solution. The people tasked to source a storage technology solution are not placing a set of goals and objectives. Capacity appears to be the only thing on their mind.

I am about to meet a procurement officer of a customer soon. She asked me this question “Why is your storage so expensive?” over email. I want to change her mindset, just like the many officers and C-levels who hold the purse strings.

Let’s frame the use storage infrastructure in the real world. Nobody buys a storage system just to keep data in there much like a puddle keeps stagnant water. Sooner or later the value of the data in the storage evaporates or the value becomes dull if the data is not used well in any ways, shape or form.

Storage systems and the interconnected pathways from on premises, to the next premises, to the edge and to the clouds serve the greater good for Data. Data is used, shared, shaped, improved, enhanced, protected, moved, and more to deliver Value to the Business.

Storage capacity is just one of the few factors to consider when investing in a storage infrastructure solution. In fact, capacity is probably the least important piece when considering a storage solution to achieve the company’s OKRs. If we think about it deeper, setting the foundation for Data in the defensive manner will help elevate value of the data to be promoted with the offensive strategies to gain the competitive advantage.

Storage infrastructure and storage solutions along with data management platforms may appear to be a cost to the annual budgets. If you know set the OKRs, define A to get to B, alignment the goals, storage infrastructure and the data management platforms and practices are investments that are worth their weight in gold. That is my guarantee.

On the flip side, ignoring and avoiding OKRs, and set the strategies without prudence will yield its comeuppance. Technical debts will prevail.

Rant over.

Disaggregation and Composability vital for AI/DL models to scale

New generations of applications and workloads like AI/DL (Artificial Intelligence/Deep Learning), and HPC (High Performance Computing) are breaking the seams of entrenched storage infrastructure models and frameworks. We cannot continue to scale-up or scale-out the storage infrastructure to meet these inundating fluctuating I/O demands. It is time to look at another storage architecture type of infrastructure technology – Composable Infrastructure Architecture.

Infrastructure is changing. The previous staid infrastructure architecture parts of compute, network and storage have long been thrown of the window, precipitated by the rise of x86 server virtualization almost 20 years now. It triggered a tsunami of virtualizing everything, including storage virtualization, which eventually found a more current nomenclature – Software Defined Storage. Both storage virtualization and software defined storage (SDS) are similar and yet different and should be revered through different contexts and similar goals. This Tech Target article laid out both nicely.

As virtualization raged on, converged infrastructure (CI) which evolved into hyperconverged infrastructure (HCI) went fever pitch for a while. Companies like Maxta, Pivot3, Atlantis, are pretty much gone, with HPE® Simplivity and Cisco® Hyperflex occasionally blipped in my radar. In a market that matured very fast, HCI is now dominated by Nutanix™ and VMware®, with smaller Microsoft®, Dell EMC® following them.

From HCI, the attention of virtualization has shifted something more granular, more scalable in containerization. Despite a degree of complexity, containerization is taking agility and scalability to the next level. Kubernetes, Dockers are now mainstay nomenclature of infrastructure engineers and DevOps. So what is driving composable infrastructure? Have we reached the end of virtualization? Not really.

Evolution of infrastructure. Source: IDC

It is just that one part of the infrastructure landscape is changing. This new generation of AI/ML workloads are flipping the coin to the other side of virtualization. As we see the diagram above, IDC brought this mindset change to get us to Think Composability, the next phase of Infrastructure.

Continue reading

Understanding security practices in File Synchronization

Ho hum. Another day, and another data leak. What else is new?

The latest hullabaloo in my radar was from one of Malaysia’s reverent universities, UiTM, which reported a data leak of 11,891 student applicants’ private details including MyKad (national identity card) numbers of each individual. Reading from the news article, one can deduced that the unsecured link mentioned was probably from a cloud storage service, i.e. file synchronization software such as OneDrive, Google Drive, Dropbox, etc. Those files that can be easily shared via an HTTP/S URL link. Ah, convenience over the data security best practices. 

Cloud File Sync software

It irks me when data security practices are poorly practised. And it is likely that there is ignorance of data security practices in the first place.

It also irks me when many end users everywhere I have encountered tell me their file synchronization software is backup. That is just a very poor excuse of a data protection strategy, if any, especially in enterprise and cloud environments. Convenience, set-and-forget mentality. Out of sight. Out of mind. Right? 

Convenience is not data security. File Sync is NOT Backup

Many users are used to the convenience of file synchronization. The proliferation of cloud storage services with free Gigabytes here and there have created an IT segment based on BYOD, which transformed into EFSS, and now CCP. The buzzword salad involves the Bring-Your-Own-Device, which evolved into Enterprise-File-Sync-&-Share, and in these later years, Content-Collaboration-Platform.

All these are fine and good. The data industry is growing up, and many are leveraging the power of file synchronization technologies, be it on on-premises and from cloud storage services. Organizations, large and small, are able to use these file synchronization platforms to enhance their businesses and digitally transforming their operational efficiencies and practices. But what is sorely missing in embracing the convenience and simplicity is the much ignored cybersecurity housekeeping practices that should be keeping our files and data safe.

Continue reading

Reverting the Cloud First mindset

When cloud computing was all the rage, every business wanted to be on-board. Those who resisted felt the heat as the FOMO (fear of missing out) feeling set in, especially those who were doing this thing called “Digital Transformation“. The public cloud service providers took advantage of the cloud computing frenzy, calling for a “Cloud First” strategy. For a number of years, the marketing worked. The cloud first mentality became the tip of the tongue of many, encouraging droves to cloud adoption.

All this was fine and dandy but recently, we are beginning to hear and read about a few high profile cases of cloud repatriation. DHH‘s journal of Basecamp’s exit from AWS in late 2022 reverberated strongly, saying what should be a wake up call for those caught in the Cloud Computing Hotel California’s gilded cage. An even more bizarre claim about cost savings of $400 million over 3 years was made by Ahrefs, a Singapore SEO software maker which chose to use a co-location facility instead of a public cloud service.

Cloud First is not Cool (not sure where is the source is from but I got this off Twitter some months ago)

While these big news jail breaks are going against the grain, most are still in that diaspora to jump into the cloud services everywhere. In droves even. But, on and off, I am beginning to hear some grips, grunts and groans from end users in the cloud. These news have emboldened some to think that there is another choice besides shifting all IT and data services to the cloud.

Continue reading

Project COSI

The S3 (Simple Storage Service) has become a de facto standard for accessing object storage. Many vendors claim 100% compatibility to S3, but from what I know, several file storage services integration and validation with the S3 have revealed otherwise. There are certain nuances that have derailed some of the more advanced integrations. I shall not reveal the ones that I know of, but let us use this thought as a basis of our discussion for Project COSI in this blog.

Project COSI high level architecture

What is Project COSI?

COSI stands for Container Object Storage Interface. It is still an alpha stage project in Kubernetes version 1.25 as of September 2022 whilst the latest version of Kubernetes today is version 1.26. To understand the objectives COSI, one must understand the journey and the challenges of persistent storage for containers and Kubernetes.

For me at least, there have been arduous arguments of provisioning a storage repository that keeps the data persistent (and permanent) after containers in a Kubernetes pod have stopped, or replicated to another cluster. And for now, many storage vendors in the industry have settled with the CSI (container storage interface) framework when it comes to data persistence using file-based and block-based storage. You can find a long list of CSI drivers here.

However, you would think that since object storage is the most native storage to containers and Kubernetes pods, there is already a consistent way to accessing object storage services. From the objectives set out by Project COSI, turns out that there isn’t a standard way to provision and accessing object storage as compared to the CSI framework for file-based and block-based storage. So the COSI objectives were set to:

  • Kubernetes Native – Use the Kubernetes API to provision, configure and manage buckets
  • Self Service – A clear delineation between administration and operations (DevOps) to enable self-service capability for DevOps personnel
  • Portability – Vendor neutrality enabled through portability across Kubernetes Clusters and across Object Storage vendors

Further details describing Project COSI can be found here at the Kubernetes site titled “Introducing COSI: Object Storage Management using Kubernetes API“.

Standardization equals technology adoption

Standardization means consistency, control, confidence. The higher the standardization across the storage and containerized apps industry, the higher the adoption of the technology. And given what I have heard from the industry over these few years, Kubernetes, to me, even till this day, is a platform and a framework that are filled and riddled with so many moving parts. Many of the components looks the same, feels the same, and sounds the same, but might not work out the same when deployed.

Therefore, the COSI standardization work is important and critical to grow this burgeoning segment, especially when we are rocketing towards disaggregation of computing service units, resources that be orchestrated to scale up or down at the execution of codes. Infrastructure-as-Code (IAC) is becoming a reality more and more with each passing day, and object storage is at the heart of this transformation for Kubernetes and containers.

Continue reading

Open Source on my mind

Last week was cropped with topics around Open Source software. I want to voice my opinions here (with a bit of ranting) and hoping not to rouse many abhorrent comments from different parties and views. This blog is to create conversations, even controversial ones, but we must first agree that there will be disagreements. We must accept disagreements as part of this conversation.

In my 30 years career, Open Source has been a big part of my development and progress. The ideas of freely using (certain) software without any licensing implications and these software being openly available were not always welcomed, as they are now. I think the Open Source revolution has created an innovation movement that is still going strong, and it has not only permeated completely into the IT industry, Open Source has also now in almost every part of the technology-based industries as well. The Open Source influence is massive.

Open Source word cloud

In the beginning

In the beginning, in my beginning in 1992, the availability of software and its source codes was a closed one. Coming from a VAX/VMS background (I was a system admin in my mathematics department’s mini computers), Unix liberated my thinking. The final 6 months in the university was systems programming in C, and it completely changed how I wanted my career to shape. The mantra of “Free as in Freedom” in General Public License GPL (which I got know of much later) boded well with my own tenets in life.

If closed source development models led to proprietary software and a centralized way to distributing software with license, I would count the Open Source development models as one of the earliest decentralized technology frameworks. Down with the capitalistic corporations (aka Evil Empires)!

It was certainly a wonderful and generous way to make the world that it is today. It is a better world now.

Continue reading