Let’s face it. Data is bursting through its storage seams. And every organization now is storing too much data that they don’t know they have.
By 2025, IDC predicts that 80% the world’s data will be unstructured. IDC‘s report Global Datasphere Forecast 2021-2025 will see the global data creation and replication capacity expand to 181 zettabytes, an unfathomable figure. Organizations are inundated. They struggle with data growth, with little understanding of what data they have, where the data is residing, what to do with the data, and how to manage the voluminous data deluge.
The simple knee-jerk action is to store it in cloud object storage where the price of storage is $0.0000xxx/GB/month. But many IT departments in these organizations often overlook the fact that that the data they have parked in the cloud require movement between the cloud and on-premises. I have been involved in numerous discussions where the customers realized that they moved the data in the cloud moved too frequently. Often it was an erred judgement or short term blindness (blinded by the cheap storage costs no doubt), further exacerbated by the pandemic. These oversights have resulted in expensive and painful monthly API calls and egress fees. Welcome to reality. Suddenly the cheap cloud storage doesn’t sound so cheap after all.
The same can said about storing non-active unstructured data on primary storage. Many organizations have not been disciplined to practise good data management. The primary Tier 1 storage becomes bloated over time, grinding sluggishly as the data capacity grows. I/O processing becomes painfully slow and backup takes longer and longer. Sounds familiar?
The A in ABC
I brought up the ABC mantra a few blogs ago. A is for Archive First. It is part of my data protection consulting practice conversation repertoire, and I use it often to advise IT organizations to be smart with their data management. Before archiving (some folks like to call it tiering, but I am not going down that argument today), we must know what to archive. We cannot blindly send all sorts of junk data to the secondary or tertiary storage premises. If we do that, it is akin to digging another hole to fill up the first hole.
We must know which unstructured data to move replicate or sync from the Tier 1 storage to a second (or third) less taxing storage premises. We must be able to see this data, observe its behaviour over time, and decide the best data management practice to apply to this data. Take note that I said best data management practice and not best storage location in the previous sentence. There has to be a clear distinction that a data management strategy is more prudent than to a “best” storage premises. The reason is many organizations are ignorantly thinking the best storage location (the thought of the “cheapest” always seems to creep up) is a good strategy while ignoring the fact that data is like water. It moves from premises to premises, from on-prem to cloud, cloud to other cloud. Data mobility is a variable in data management.
So, if the practise is to archive, then just archive the data. Do not label the data as archive but move the data freely. There are considerations about costs, security, compliance and basic data hygiene. Data management practice discipline I am talking about here.
Datadobi takes a step up
I have been a fan of Datadobi for quite a while. I followed them probably around the 2017 time and I discovered their booth at Dell® Technologies World Las Vegas in 2019. Had several great conversations with them in the past and they are an absolute beast in what they do. DobiMigrate and DobiProtect have been the mainstay of their stable. But one thing was always missing to complete their solution portfolio.
The missing piece is data discovery. Datadobi introduced StorageMAP 2 months ago, and I was given a technology tour of the solution by a good friend in Westcon-Comstor Singapore 2 weeks ago. Here is a couple of screenshots shared in my friend, Venkat’s slides that captured my attention.
As you can see, the data discovery feature introduced in Datadobi StorageMAP has a more modern approach to many of the file lifecycle management software solutions I have encountered and worked on in the past. The ability to tag the unstructured data in both files and objects data sources helps create context of the data scanned and discovered. This brings a critical element in further understanding the right data to take action on. This data management lifecycle “wheel” shown below consolidates the power of what Datadobi StorageMAP can do – from discovery to tagging (classification and contex) to actionable operations (eg. archive, migrate, or even delete) to the discovered data.
The capability Datadobi StorageMAP brings to the data management discipline and practice just gave data practitioners and data policy architects the tool to do more accurately and intelligently determine the best value of the unstructured data in the organization. That is something extremely powerful in my books.
Going into observability and lineage
I can already see the massive potential of Datadobi StorageMAP beyond file and object migration, replication and archiving.
The power of observing and understanding unstructured data that is created, stored and used cannot be understated. In the generation of rapid data processing in real time, and also knowing the operations and actions performed on files, the ability to tag, classify the files and moving them on to the next part of the data pipeline is critical. And the next in the data pipeline can be a more cost efficient on-premises storage target such as a TrueNAS® storage, or to a S3 cloud target such as MinIO, or to cloud storage services provider. It could assign unstructured data to the right processing bucket or the designated folders for further data processing. Imagine how powerful this can be for the ML/AI (machine learning/artificial intelligence) development workflow.
I am already beginning to see this new level of unstructured data observability and data lineage coming to the unstructured data management forefront. The understanding of the changes in the unstructured data, where it came from, what state it is in now, where is should be going, will have deep, deep impact in the future unstructured data management. I am pleased to see Datadobi StorageMAP stepping forward to this exciting new future.