ILM Archives - Storage Gaga

Preliminary Data Taxonomy at ingestion. An opportunity for Computational Storage

By cfheoh | June 3, 2024 - 8:18 am |June 3, 2024 Algorithm, Analytics, API, Artificial Intelligence, Big Data, Cloud, Clusters, Composable Infrastructure, compression, Containers, Data Fabric, Data Governance, Data Management, Data Security, deduplication, Deduplication, Deep Learning, Digital Transformation, eDiscovery, IBM, ILM, iRODS, Machine Learning, Scaleflux, Solid State Devices

Leave a comment

Data governance has been on my mind a lot lately. With all the incessant talks and hype about Artificial Intelligence, the true value of AI comes from good data. Therefore, it is vital for any organization embarking on their AI journey to have good quality data. And the journey of the lifecycle of data in an organization starts at the point of ingestion, the data source of how data is either created, acquired to be presented up into the processing workflow and data pipelines for AI training and onwards to AI applications.

In biology, taxonomy is the scientific study and practice of naming, defining and classifying biological organisms based on shared characteristics.

And so, begins my argument of meshing these 3 topics together – data ingestion, data taxonomy and with Computational Storage. Here goes my storage punditry.

Data Taxonomy in post-injection

I see that data, any data, has to arrive at a repository first before they are given meaning, context, specifications. These requirements are different from file permissions, ownerships, ctime and atime timestamps, the content of the ingested data stream are made to fit into the mould of the repository the data is written to. Metadata about the content of the data gives the data meaning, context and most importantly, value as it is used within the data lifecycle. However, the metadata tagging, and preparing the data in the ETL (extract load transform) or the ELT (extract load transform) process are only applied post-ingestion. This data preparation phase, in which data is enriched with content metadata, tagging, taxonomy and classification, is expensive, in term of resources, time and currency.

Elements of a modern event-driven architecture including data ingestion (Credit: Qlik)

Even in the burgeoning times of open table formats (Apache Iceberg, HUDI, Deltalake, et al), open big data file formats (Avro, Parquet) and open data formats (CSV, XML, JSON et.al), the format specifications with added context and meanings are added in and augmented post-injection.

Continue reading →

A Data Management culture to combat Ransomware

By cfheoh | July 10, 2023 - 8:00 am |July 10, 2023 Backup, Business Continuity, Data, Data Archiving, Data Availability, Data Corruption, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Disaster Recovery, EasiShare, ILM, iXsystems, NAS, Reliability, Security, Snapshots, Storage Optimization, Storage Tiering, Tape storage, TrueNAS

1 Comment

On the road, seat belt saves lives. So does the motorcycle helmet. But these 2 technologies alone are probably not well received and well applied daily unless there is a strong ecosystem and culture about road safety. For decades, there have been constant and unrelenting efforts to enforce the habits of putting on the seat belt or the helmet. Statistics have shown they reduce road fatalities, but like I said, it is the safety culture that made all this happen.

On the digital front, the ransomware threats are unabated. In fact, despite organizations (and individuals), both large and small, being more aware of cyber-hygiene practices more than ever, the magnitude of ransomware attacks has multiplied. Threat actors still see weaknesses and gaps, and vulnerabilities in the digital realms, and thus, these are lucrative ventures that compliment the endeavours.

Time to look at Data Management

The Cost-Benefits-Risks Conundrum of Data Management

And I have said this before in the past. At a recent speaking engagement, I brought it up again. I said that ransomware is not a cybersecurity problem. Ransomware is a data management problem. I got blank stares from the crowd.

I get it. It is hard to convince people and companies to embrace a better data management culture. I think about the Cost-Benefits-Risk triangle while I was analyzing the lack of data management culture used in many organizations when combating ransomware.

I get it that Cybersecurity is big business. Even many of the storage guys I know wanted to jump into the cybersecurity bandwagon. Many of the data protection vendors are already mashing their solutions with a cybersecurity twist. That is where the opportunities are, and where the cool kids hang out. I get it.

Cybersecurity technologies are more tangible than data management. I get it when the C-suites like to show off shiny new cybersecurity “toys” because they are allowed to brag. Oh, my company has just implemented security brand XXX, and it’s so cool! They can’t be telling their golf buddies that they have a new data management culture, can they? What’s that?

Continue reading →

Is there no end to the threat of ransomware?

By cfheoh | June 20, 2022 - 8:00 am |June 20, 2022 Appliance, Artificial Intelligence, Backup, Business Continuity, Cloud, Cohesity, Data, Data Archiving, Data Availability, Data Management, Data Privacy, Data Protection, Data Security, Digital Transformation, Disaster Recovery, Druva, Filesystems, HDS, Hitachi Vantara, ILM, iRODS, Racktop Systems, Reliability, Rubrik, SASE, Security, Sophos, Storage Field Day, Tech Field Day

Leave a comment

I find it blasphemous that with all the rhetoric of data protection and cybersecurity technologies and solutions in the market today, the ransomware threats and damages have grown proportionately larger each year. In a recent report by Kaspersky on Anti-Ransomware Day May 12th, 9 out of 10 of organizations previously attacked by ransomware are willing to pay again if attacked again. A day before my scheduled talk in Surabaya East Java 2 weeks’ back, the chatter through the grapevine was one bank in Indonesia was attacked by ransomware on that day. These news proved how virulent and dangerous the ransomware scourge is and has become.

And the question that everyone wants an answer to is … why are ransomware threats getting bigger and more harmful and there are no solutions to it?

Digital transformation and its data are very attractive targets

Today, all we hear from the data protection and storage vendors are recovery, restore that data blah, blah, blah and more blah, blah, blahs. The end point EDR (endpoint detection and response) solutions say they can stop it; the cybersecurity experts preach depth in defense; and the network security guys say use perimeter fencing. And the anti-phishing chaps say more awareness and education required. One or all have not worked effectively these few years. Ransomware’s threats and damages are getting worse. Why?

Continue reading →

Unstructured Data Observability with Datadobi StorageMAP

By cfheoh | May 23, 2022 - 8:00 am |May 22, 2022 Artificial Intelligence, Backup, CIFS, Cloud, Data Archiving, Data Management, Data Privacy, Data Protection, Data Security, Datadobi, Digital Transformation, eDiscovery, Filesystems, FreeNAS, ILM, iXsystems, Minio, NAS, NFS, Object Storage, SMB, Storage Tiering, TrueNAS

Leave a comment

Let’s face it. Data is bursting through its storage seams. And every organization now is storing too much data that they don’t know they have.

By 2025, IDC predicts that 80% the world’s data will be unstructured. IDC‘s report Global Datasphere Forecast 2021-2025 will see the global data creation and replication capacity expand to 181 zettabytes, an unfathomable figure. Organizations are inundated. They struggle with data growth, with little understanding of what data they have, where the data is residing, what to do with the data, and how to manage the voluminous data deluge.

The simple knee-jerk action is to store it in cloud object storage where the price of storage is $0.0000xxx/GB/month. But many IT departments in these organizations often overlook the fact that that the data they have parked in the cloud require movement between the cloud and on-premises. I have been involved in numerous discussions where the customers realized that they moved the data in the cloud moved too frequently. Often it was an erred judgement or short term blindness (blinded by the cheap storage costs no doubt), further exacerbated by the pandemic. These oversights have resulted in expensive and painful monthly API calls and egress fees. Welcome to reality. Suddenly the cheap cloud storage doesn’t sound so cheap after all.

The same can said about storing non-active unstructured data on primary storage. Many organizations have not been disciplined to practise good data management. The primary Tier 1 storage becomes bloated over time, grinding sluggishly as the data capacity grows. I/O processing becomes painfully slow and backup takes longer and longer. Sounds familiar?

The A in ABC

I brought up the ABC mantra a few blogs ago. A is for Archive First. It is part of my data protection consulting practice conversation repertoire, and I use it often to advise IT organizations to be smart with their data management. Before archiving (some folks like to call it tiering, but I am not going down that argument today), we must know what to archive. We cannot blindly send all sorts of junk data to the secondary or tertiary storage premises. If we do that, it is akin to digging another hole to fill up the first hole.

We must know which unstructured data to move replicate or sync from the Tier 1 storage to a second (or third) less taxing storage premises. We must be able to see this data, observe its behaviour over time, and decide the best data management practice to apply to this data. Take note that I said best data management practice and not best storage location in the previous sentence. There has to be a clear distinction that a data management strategy is more prudent than to a “best” storage premises. The reason is many organizations are ignorantly thinking the best storage location (the thought of the “cheapest” always seems to creep up) is a good strategy while ignoring the fact that data is like water. It moves from premises to premises, from on-prem to cloud, cloud to other cloud. Data mobility is a variable in data management.

Continue reading →

How well do you know your data and the storage platform that processes the data

By cfheoh | December 20, 2021 - 7:30 am |December 20, 2021 100Gigabit Ethernet, 10Gigabit Ethernet, Algorithm, Analytics, Appliance, Backup, Big Data, Business Continuity, Cloud, Clusters, Composable Infrastructure, compression, Confluent, Data Archiving, Data Availability, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Deduplication, Digital Transformation, Disaster Recovery, Edge Computing, Filesystems, Hyperconvergence, ILM, Industry 4.0, InfluxDB, iRODS, Machine Learning, NAS, NFS, NVMe, Object Storage, Performance Caching, Pravega, Reliability, SATA, Scale-out architecture, Security, Software Defined Storage, Storage Area Network, Storage Optimization, Storage Tiering, Unified Storage, VDI, Virtualization

Leave a comment

Last week was consumed by many conversations on this topic. I was quite jaded, really. Unfortunately many still take a very simplistic view of all the storage technology, or should I say over-marketing of the storage technology. So much so that the end users make incredible assumptions of the benefits of a storage array or software defined storage platform or even cloud storage. And too often caveats of turning on a feature and tuning a configuration to the max are discarded or neglected. Regards for good storage and data management best practices? What’s that?

I share some of my thoughts handling conversations like these and try to set the right expectations rather than overhype a feature or a function in the data storage services.

Complex data networks and the storage services that serve it

I/O Characteristics

Applications and workloads (A&W) read and write from the data storage services platforms. These could be local DAS (direct access storage), network storage arrays in SAN and NAS, and now objects, or from cloud storage services. Regardless of structured or unstructured data, different A&Ws have different behavioural I/O patterns in accessing data from storage. Therefore storage has to be configured at best to match these patterns, so that it can perform optimally for these A&Ws. Without going into deep details, here are a few to think about:

Random and Sequential patterns
Block sizes of these A&Ws ranging from typically 4K to 1024K.
Causal effects of synchronous and asynchronous I/Os to and from the storage

Continue reading →

At the mercy of the cloud deity

By cfheoh | December 13, 2021 - 8:00 am |December 11, 2021 Amazon, Amazon Web Services, API, Backup, Big Data, Business Continuity, Cloud, Data Availability, Data Management, Data Protection, Data Security, Disaster Recovery, Gartner, Google, IBM, ILM, Microsoft Azure, Oracle Cloud, Reliability, Software-defined Datacenter

Leave a comment

Amazon Web Services (AWS) went down in the middle of last week. News of the outage were mentioned:

AWS Management Console unavailable error

Piling the misery

The AWS outage headlines attract the naysayers, the fickle armchair pundits, and the opportunists. Here are a few news articles that bring these folks to chastise the cloud giant.

Of course, I am one of these critics. I don’t deny that I am not. But I read this situation from a multicloud hyperbole of which I am not a fan. Too much multicloud whitewashing by vendors trying to pitch multicloud as a disaster recovery solution without understanding that this is easier said than done.

Continue reading →

The Starbucks model for Storage-as-a-Service

By cfheoh | October 11, 2021 - 8:00 am |October 9, 2021 Amazon Web Services, API, Appliance, Artificial Intelligence, Big Data, Cloud, Data Archiving, Data Availability, Data Corruption, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Digital Transformation, FreeNAS, Google Anthos, ILM, iXsystems, Kubernetes, Nextcloud, Pravega, Pure Storage, Software Defined Storage, TrueNAS

Leave a comment

Starbucks™ is not a coffee shop. It purveys beyond coffee and tea, and food and puts together the yuppie beverages experience. The intention is to get the customers to stay as long as they can, and keep purchasing the Starbucks’ smorgasbord of high margin provisions in volume. Wifi, ambience, status, coffee or tea with your name on it (plenty of jokes and meme there), energetic baristas and servers, fancy coffee roasts and beans et. al. All part of the Starbucks™-as-a-Service pleasurable affair that intends to lock the customer in and have them keep coming back.

The Starbucks experience

Data is heavy and they know it

Unlike compute and network infrastructures, storage infrastructures holds data persistently and permanently. Data has to land on a piece of storage medium. Coupled that with the fact that data is heavy, forever growing and data has gravity, you have a perfect recipe for lock-in. All storage purveyors, whether they are on-premises data center enterprise storage or public cloud storage, and in between, there are many, many methods to keep the data chained to a storage technology or a storage service for a long time. The storage-as-a-service is like tying the cow to the stake and keeps on milking it. This business model is very sticky. This stickiness is also a lock-in mechanism.

Continue reading →

Enterprise Storage is not just a Label

By cfheoh | July 26, 2021 - 9:00 am |July 23, 2021 Appliance, Cloud, Data, Data Archiving, Data Availability, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Digital Transformation, ILM, Jon Toigo, RAID, Reliability, SATA, Security, Software Defined Storage, Unified Storage, Virtualization

3 Comments

I have many anecdotes around the topic of Enterprise Storage, but the conversations in the past 2 weeks made it important for me to share this.

Enterprise Storage is …

Amusing, painful, angry

I get riled up whenever people do not want to be educated about Enterprise Storage. Here are a few that happened in the last 2 weeks.

[ Story #1 ]

A guy was building his own storage for cryptocurrency. He was informed by his supplier that the RAID card was enterprise, and he could get the best performance using “Enterprise” RAID-0.

Well, “Enterprise” RAID-0 volume crashed, and he lost all data. Painfully, he said he lost a hefty sum financially

[ Story #2 ]

A media company complained about the reliability of previous storage vendor. The GM was shopping around and was told that there are “Enterprise” SATA drives and the reliability is as good, if not better than SAS drives.

The company wanted a fully reliable Enterprise Storage system with 99.999% availability, and yet the SATA interface was not meant to build a more highly reliable enterprise storage. The GM insisted to use “Enterprise” SATA drives for his “enterprise” storage system instead of SAS.

[ Story #3 ]

An IT admin of a manufacturing company claimed that they had an “Enterprise Storage” system for a few years, and could not figure out why his hard disk drives would die every 12-15 months.

He figured out that the drives supplied by his vendor were consumer SATA drives, even though he was told it was an “Enterprise Storage” system when he bought the system.

Continue reading →

Do we still need FAST (and its cohorts)?

By cfheoh | February 1, 2021 - 9:00 am |January 22, 2021 Algorithm, Analytics, Dell, DellEMC, Disks, EMC, Filesystems, Flash, HP, HPE, IBM, ILM, Intel, iXsystems, RAID, Solid State Devices, Storage Optimization, Storage Tiering

Leave a comment

In a recent conversation with an iXsystems™ reseller in Hong Kong, the topic of Storage Tiering was brought up. We went about our banter and I brought up the inter-array tiering and the intra-array tiering piece.

After that conversation, I started thinking a lot about intra-array tiering, where data blocks within the storage array were moved between fast and slow storage media. The general policy was simple. Find all the least frequently access blocks and move them from a fast tier like the SSD tier, to a slower tier like the spinning drives with different RPM speeds. And then promote the data blocks to the faster media when accessed frequently. Of course, there were other variables in the mix besides storage media and speeds.

My mind raced back 10 years or more to my first encounter with Compellent and 3PAR. Both were still independent companies then, and I had my first taste of intra-array tiering

The original Compellent and 3PAR logos

I couldn’t recall which encounter I had first, but I remembered the time of both events were close. I was at Impact Business Solutions in their office listening to their Compellent pitch. The Kuching boys (thank you Chyr and Winston!) were very passionate in evangelizing the Compellent Data Progression technology.

At about the same time, I was invited by PTC Singapore GM at the time, Ken Chua to grace their new Malaysian office and listen to their latest storage vendor partnership, 3PAR. I have known Ken through my NetApp® days, and he linked me up Nathan Boeger, 3PAR’s pre-sales consultant. 3PAR had their Adaptive Optimization (AO) disk tiering and Dynamic Optimization (DO) technology.

Continue reading →

StorageGRID gets gritty

By cfheoh | March 9, 2020 - 7:06 am |March 9, 2020 Acquisition, Amazon Web Services, Analytics, API, Appliance, Artificial Intelligence, Backup, Big Data, Cloud, Clusters, Data Archiving, Data Fabric, Data Management, Data Protection, Deep Learning, Filesystems, HDS, Hitachi Vantara, ILM, Machine Learning, NAS, NetApp, Object Storage, Software Defined Storage, Storage Field Day, Storage Market Share, Storage Optimization, Tech Field Day

2 Comments

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at the event. The content of this blog is of my own opinions and views ]

NetApp® presented StorageGRID® Webscale (SGWS) at Storage Field Day 19 last month. It was timely when the general purpose object storage market, in my humble opinion, was getting disillusioned and almost about to deprive itself of the value of what it was supposed to be.

“Cheap and deep“, “Race to Zero” were some of the less storied calls I have come across when discussing about object storage, and it was really de-valuing the merits of object storage as vendors touted their superficial glory of being in the IDC Marketscape for Object-based Storage 2019.

Almost every single conversation I had in the past 3 years was either explaining what object storage is or “That is cheap storage right?”

Continue reading →

Category Archives: ILM