AI and the Data Factory

When I first heard of the word “AI Factory”, the world was blaring Jensen Huang‘s keynote at NVIDIA GTC24. I thought those were cool words, since he mentioned about the raw material of water going into the factory to produce electricity. The analogy was spot on for the AI we are building.

As I engage with many DDN partners and end users in the region, week in, week out, the “AI Factory” word keeps popping into conversations. Yet, many still do not know how to go about building this “AI Factory”. They only know they need to buy GPUs, lots of them. These companies’ AI ambitions are unabated. And IDC predicts that worldwide spending on AI will double by 2028, and yet, the ROI (returns on investment) remains elusive.

At the ground level, based on many conversations so far, the common theme is, the steps to begin building the AI Factory are ambiguous and fuzzy to most. I like to share my views from a data storage point of view. Hence, my take on the Data Factory for AI.

Are you AI-ready?

We have to have a plan but before we take the first step, we must look at where we are standing at the present moment. We know that to train AI, the proverbial step is, we need lots of data. Deep Learning (DL) works with Large Language Models (LLMs), and Generative AI (GenAI), needs tons of data.

If the company knows where they are, they will know which phase is next. So, in the AI Maturity Model (I simplified the diagram below), where is your company now? Are you AI-ready?

Simplified AI Maturity Model

Get the Data Strategy Right

In his interview with CRN, MinIO’s CEO AB Periasamy quoted “For generative AI, they realized that buying more GPUs without a coherent data strategy meant GPUs are going to idle out”. I was struck by his wisdom about having a coherent data strategy because that is absolutely true. This is my starting point. Having the Right Data Strategy.

In the AI world, from a data storage guy, data is the fuel. Data is the raw material that Jensen alluded to, if it was obvious. We have heard this anecdotal quote many times before, even before the AI phenomenon took over. AI is data-driven. Data is vital for the ROI of AI projects. And thus, we must look from the point of the data to make the AI Factory successful.

Continue reading

What next after Cyber Resiliency?

There was a time some years ago when some storage vendors, especially the object storage ones, started calling themselves the “last line of defence”. And even further back, when the purpose-built backup appliances (PBBAs) first appeared, a very smart friend of mine commented that they shouldn’t call it “backup appliance”, but rather they should call it “restore appliance”. That was because the data restoration part, or to be more relevant in today’s context, data recovery is the key to a crucial line of defence against cybersecurity threats to data, especially ransomware. We have a saying in the industry. “Hundreds of good backups are not as good as one good restore.” Of course, this data restoration part has become more sophisticated in the data recovery processes.

In recent years, we also seen the amalgamation of both data protection species – the backup/restore side and the cybersecurity side – giving rise to the term and the proliferation of Cyber Resilience.

Dialing Cyber Resilience (Picture from tehtris.com)

I have no qualms or lack of confidence of the cyber resilience technologies. I am pretty sure they can do the job extremely well, so much so, that some give million dollars guarantees if ever their solution failed. Druva announced their Data Resiliency Guarantee of USD$10 million and Rubrik has their Ransomware Recovery Warranty.

Of course, these warranties and guarantees come with terms and conditions, and caveats and not everyone is besotted by these big numbers’ payout. My friend, Andrew Martin, wrote a tongue-in-cheek piece last year about Rubrik’s warranty guarantee in his Data Storage Asia blog last year, which discussed whether it was Rubrik’s genuineness or spuriousness that might win or lose customers’ affections. You should read his blog to decide.

Continue reading

Making Immutability the key factor in a Resilient Data Protection strategy

We often hear “Cyber Resilience” word thrown around these days. Every backup vendor has a cybersecurity play nowadays. Many have morphed into cyber resilience warrior vendors, and there is a great amount of validation in terms of Cyber Resilience in a data protection world. Don’t believe me?

Check out this Tech Field Day podcast video from a month ago, where my friends, Tom Hollingsworth and Max Mortillaro discussed the topic meticulously with Krista Macomber, who has just become the Research Director for Cybersecurity at The Futurum Group (Congrats, Krista!).

Cyber Resilience, as well articulated in the video, is not old wine in a new bottle. The data protection landscape has changed significantly since the emergence of cyber threats and ransomware that it warrants the coining of the Cyber Resilience terminology.

But I want to talk about one very important cog in the data protection strategy, of which cyber resilience is part of. That is Immutability, because it is super important to always consider immutable backups as part of that strategy.

It is no longer 3-2-1 anymore, Toto. 

When it comes to backup, I always start with 3-2-1 backup rule. 3 copies of the data; 2 different media; 1 offsite. This rule has been ingrained in me since the day I entered the industry over 3 decades ago. It is still the most important opening line for a data protection specialist or a solution architect. 3-2-1 is the table stakes.

Yet, over the years, the cybersecurity threat landscape has moved closer and closer to the data protection, backup and recovery realm. This is now a merged super-segment pangea called cyber resilience. With it, the conversation from the 3-2-1 backup rule in these last few years is now evolving into something like 3-2-1-1-0 backup rule, a modern take of the 3-2-1 backup rule. Let’s take a look at the 3-2-1-1-0 rule (simplified by me).

The 3-2-1-1-0 Backup rule (Credit: https://www.dataprise.com/services/disaster-recovery/baas/)

Continue reading

Data Trust and Data Responsibility. Where we should be at before responsible AI.

Last week, there was a press release by Qlik™, informing of a sponsored TechTarget®‘s Enterprise Strategy Group (ESG) about the state of responsible AI practices across industries. The study highlighted critical gaps in the approach to responsible AI, ethical AI practices and AI regulatory compliances. From the study, Qlik™ emphasizes on having a solid data foundation. To get to that bedrock foundation, we must trust the data and we must be responsible for the kinds of data that built that foundation. Hence, Data Trust and Data Responsibility.

There is an AI boom right now. Last year alone, the AI machine and its hype added in USD$2.4 trillion market cap to US tech companies. 5 months into 2024, AI is still supernova hot. And many are very much fixated to the infallible fables and tales of AI’s pompous splendour. It is this blind faith that I see many users and vendors alike sidestepping the realities of AI in the present state as it is.

AI is not always responsible. Then it begs the question, “Are we really working with a responsible set of AI applications and ecosystems“?

Responsible AI. Are we there yet?

AI still hallucinates, unfortunately. The lack of transparency of AI applications coming to a conclusion and a recommended decision is not always known. What if you had a conversation with ChatGPT and it says that you are dead. Well, that was exactly what happened when Tom’s Guide writer, Tony Polanco, found out from ChatGPT that he passed away in September 2021.

Continue reading

NIST CSF 2.0 brings Data Governance into the light

In the past weekend, I watched a CNA Insider video delving into Data Theft in Malaysia. It is titled “Data Theft in Malaysia: How your personal information may be exploited | Cyber Scammed”.

You can watch the 45-minute video below.

Such dire news is nothing new. We Malaysians are numbed to those telemarketers calling and messaging to offer their credit card services, loans, health spa services. You name it; there is something to sell. Of course, these “services” are mostly innocuous, but in recent years, the forms of scams are risen up several notches and severity levels. The levels of sophistication, the impacts, and the damages (counting financial and human casualties) have rocketed exponentially. Along with the news, mainstream and others, the levels of awareness and interests in data, especially PII (personal identifiable information) in Malaysians, are at its highest yet.

Yet the data theft continues unabated. Cybersecurity Malaysia (CSM), just last week, reported a 1,192% jump of data theft cases in Malaysia in 2023. In an older news last year, cybersecurity firm Surf Shark ranked Malaysia as the 8th most breached country in Q3 of 2023.
Continue reading

Storage does not mean Capacity only

I was listening to several storage luminaries in the GestaltIT’s podcastNo one understands Storage anymore” a few of weeks ago. Around the minute of 11.09 in the podcast, Dr. J. Metz, SNIA® Chair, brought up this is powerful quote “Storage does not mean Capacity“. It struck me, not in a funny way. It is what it is, and it something I wanted to say to many who do not understand the storage solutions they are purchasing. It exemplifies what is wrong in the many organizations today in their understanding of investing in a storage infrastructure project.

This is my pet peeve. The first words uttered in most, if not all storage requirements in my line of work are, “I want this many Terabytes of storage“. There are no other details and context of what the other requirement factors are, such as availability, performance, future growth, etc. Or even the goals to achieve when purchasing a storage system and operating it. What is the improvement they are looking for? What are the problems to solve?

Where is the OKR?

It pains me to say this. For the folks who have in the IT industry for years, both end users and IT purveyors alike, most are absolutely clueless about OKR (Objectives and Key Results) for their storage infrastructure project. Many cannot frame the data challenges they are facing, and they have no idea where to go next. There is no alignment. There is no strategy. Even worse, there is no concept of how their storage infrastructure investments will improve their business and operations.

Just the other day, one company director from a renown IT integrator here in Malaysia came calling. He has been in the IT industry since 1989 (I checked his Linkedin profile), asking to for a 100TB storage quote. I asked a few questions about availability, performance, scalability; the usual questions a regular IT guy would ask. He has no idea, and instead of telling me he didn’t know, he gave me a runaround of this and that. Plenty of yada, yada nonsense.

In the end, I told him to buy a consumer grade storage appliance from Taiwan. I will just let him make a fool of himself in front of his customer since he didn’t want to take accountability of ensuring his customer get a proper enterprise storage solution in good faith. His customer is probably in the same mould as well.

Defensive Strategies as Data Foundations

A strong storage infrastructure foundation is vital for good Data Credibility. If you do the right things for your data, there is Data Value, and it will serve your business well. Both Data Credibility and Data Value create confidence. And Confidence equates Trust.

In order to create the defensive strategies let’s look at storage Availability, Protection, Accessibility, Management Security and Compliance. These are 6 of the 8 data points of the A.P.P.A.R.M.S.C. framework.

Offensive Strategies as Competitive Advantage

Once we have achieved stability of the storage infrastructure foundation, then we can turn over and drive towards storage Performance, Recovery, plus things like Scalability and Agility.

With a strong data infrastructure foundation, the organization can embark on the offensive, and begin their business transformation journey, knowing that their data is well run, protection, and performs.

Alignment with Data and Business Goals

Why are the defensive and offensive strategies requiring alignment to business goals?

The fact is simple. It is about improving the business and operations, and setting OKRs is key to measure the ROI (return of investment) of getting the storage systems and the solutions in place. It is about switching the cost-fearing (negative) mindset to a profit-conviction (positive) mindset.

For example, maybe the availability of the data to the business is poor. Maybe there is the need to have access to the data 24×7, because the business is going online. The simple measurable fact is we can move availability from 95% uptime to 99.99% uptime with an HA storage system.

Perhaps there are concerns about recoverability in the deluge of ransomware threats. Setting new RPO goals from 24 hours to 4 hours is a measurable objective to enhance data resiliency.

Or getting the storage systems to deliver higher performance from 350 IOPS to 5000 IOPS for the database.

What I am saying here is these data points are measurable, and they can serve as checkpoints for business and operational improvements. From a management perspective, these can be used as KPI (key performance index) to define continuous improvement of Data Confidence.

Furthermore, it is easy when a OKR dashboard is used to map the improvement markers when organizations use storage to move from point A to point B, where B equates to a new success milestone. The alignment sets the paths to the business targets.

Storage does not mean only Capacity

The sad part is what the OKRs and the measured goals alignments are glaringly missing in the minds of many organizations purchasing a storage infrastructure and data management solution. The people tasked to source a storage technology solution are not placing a set of goals and objectives. Capacity appears to be the only thing on their mind.

I am about to meet a procurement officer of a customer soon. She asked me this question “Why is your storage so expensive?” over email. I want to change her mindset, just like the many officers and C-levels who hold the purse strings.

Let’s frame the use storage infrastructure in the real world. Nobody buys a storage system just to keep data in there much like a puddle keeps stagnant water. Sooner or later the value of the data in the storage evaporates or the value becomes dull if the data is not used well in any ways, shape or form.

Storage systems and the interconnected pathways from on premises, to the next premises, to the edge and to the clouds serve the greater good for Data. Data is used, shared, shaped, improved, enhanced, protected, moved, and more to deliver Value to the Business.

Storage capacity is just one of the few factors to consider when investing in a storage infrastructure solution. In fact, capacity is probably the least important piece when considering a storage solution to achieve the company’s OKRs. If we think about it deeper, setting the foundation for Data in the defensive manner will help elevate value of the data to be promoted with the offensive strategies to gain the competitive advantage.

Storage infrastructure and storage solutions along with data management platforms may appear to be a cost to the annual budgets. If you know set the OKRs, define A to get to B, alignment the goals, storage infrastructure and the data management platforms and practices are investments that are worth their weight in gold. That is my guarantee.

On the flip side, ignoring and avoiding OKRs, and set the strategies without prudence will yield its comeuppance. Technical debts will prevail.

Rant over.

Open Source Storage and Data Responsibility

There was a Super Blue Moon a few days ago. It was a rare sky show. Friends of mine who are photo and moon gazing enthusiasts were showing off their digital captures online. One ignorant friend, who was probably a bit envious of the other people’s attention, quipped that his Oppo Reno 10 Pro Plus can take better pictures. Oppo Reno 10 Pro Plus claims 3x optical zoom and 120x digital zoom. Yes, 120 times!

Yesterday, a WIRED article came out titled “How Much Detail of the Moon Can Your Smartphone Really Capture?” It was a very technical article. I thought the author did an excellent job explaining the physics behind his notes. But I also found the article funny, flippant even, when I juxtaposed this WIRED article to what my envious friend was saying the other day about his phone’s camera.

Super Blue Moon 2023

Open Source storage expectations and outcomes

I work for iXsystems™. Open Source has been its DNA for over 30 years. Similarly, I have also worked on Open Source (decades before it was called open source) in my home labs ever since I entered the industry. I had SoftLanding Linux System 3.5″ diskette (Linux kernel 0.99), and I bought a boxed set of FreeBSD OS from Walnut Creek (photo below). My motivation was to learn as much as possible about information technology world because I was making my first steps into building my career (I was also quietly trying to prove my father wrong) in the IT industry.

FreeBSD Boxed Set (circa 1993)

 

Open source has democratized technology. It has placed the power of very innovative technology into the hands of the common people With Open Source, I see the IT landscape changing as well, especially for home labers like myself in the early years. Social media platforms, FAANG (Facebook, Apple, Amazon, Netflix, Google), etc, etc, have amplified that power (to the people). But with that great power, comes great responsibility. And some users with little technology background start to have hallucinated expectations and outcomes. Just like my friend with the “powerful” Oppo phone.

Likewise, in my world, I have plenty of anecdotes of these types of open source storage users having wild expectations, but little skills to exact the reality.

Continue reading

A Data Management culture to combat Ransomware

On the road, seat belt saves lives. So does the motorcycle helmet. But these 2 technologies alone are probably not well received and well applied daily unless there is a strong ecosystem and culture about road safety. For decades, there have been constant and unrelenting efforts to enforce the habits of putting on the seat belt or the helmet. Statistics have shown they reduce road fatalities, but like I said, it is the safety culture that made all this happen.

On the digital front, the ransomware threats are unabated. In fact, despite organizations (and individuals), both large and small, being more aware of cyber-hygiene practices more than ever, the magnitude of ransomware attacks has multiplied. Threat actors still see weaknesses and gaps, and vulnerabilities in the digital realms, and thus, these are lucrative ventures that compliment the endeavours.

Time to look at Data Management

The Cost-Benefits-Risks Conundrum of Data Management

And I have said this before in the past. At a recent speaking engagement, I brought it up again. I said that ransomware is not a cybersecurity problem. Ransomware is a data management problem. I got blank stares from the crowd.

I get it. It is hard to convince people and companies to embrace a better data management culture. I think about the Cost-Benefits-Risk triangle while I was analyzing the lack of data management culture used in many organizations when combating ransomware.

I get it that Cybersecurity is big business. Even many of the storage guys I know wanted to jump into the cybersecurity bandwagon. Many of the data protection vendors are already mashing their solutions with a cybersecurity twist. That is where the opportunities are, and where the cool kids hang out. I get it.

Cybersecurity technologies are more tangible than data management. I get it when the C-suites like to show off shiny new cybersecurity “toys” because they are allowed to brag. Oh, my company has just implemented security brand XXX, and it’s so cool! They can’t be telling their golf buddies that they have a new data management culture, can they? What’s that?

Continue reading