A recent report intrigued me. Given the recent uprising of data, data and more data, things are getting a bit absurd about the voluminous data we are collecting and storing. The flip is that we might need all these data for analytics and getting more insight from the data.
The Veritas Darkberg report revealed that a very large percentage of the data collected and stored by organizations are useless data, unknown and unused. I captured a snapshot of the report below:
From the screenshot above, it shows 54% of the landscape surveyed is dark data, unseen and clogging up the storage. And in an instance, the Darkberg (cross of “Dark” and “Iceberg”) report knocked a lot of sense into this whole data acquisition frenzy we are going through right now.
I recalled there was a penetration test job that I got about 5 years ago. One local university called up to get some understanding of why their 34Mbps line was chugging sluggishly. We went in to do the job, pen-test and all. Then it took an interesting turn when we saw the high network throughput was coming from a particular production server tied to an HDS Tagmastore. We requested for admin password to the server, and proceeded to do a bit more with the assessment of file system of the server in question. There were a couple of decent size LUNs provisioned to the server, amounting to 5TB each. A few days later, the tests were concluded.
A week later, we published our report to the head of IT of the computer science department. And what revealed in our report was quite shocking to him, to say the least. We reported that someone with administrator password (and many of them had the admin password), to the production server and the HDS TagmaStore storage had been running his/her own little Torrent business in them. Both the LUNs were repositories for Torrents streams, and the downloads were clogging up the 34Mbps line, and the storage LUNs were trashing between streaming downloads and the Postgres database it was serving. Obviously to the university, this was utterly useless (but not useless to the fellas running that underground scheme of theirs), but the file system assessment test that we did obviously revealed what was valuable data and what was not.
As time passed, the file system assessment service has slowly hurtled towards oblivion as storage vendors started touting storage tiering that veered towards performance, pricing and spindles rather that the value of the file contents in the storage itself.
Back when I was at Interica not too long ago, the solutions of PARS, PRM and Smartmove carry the very DNA of a file system assessment. For an IT professional, they look like storage tiering but in reality, they reveal much more about the data and file landscape of the organization or a seismic project.
Take for instance Interica PRM (Project Resource Manager). It has the unique capability to be transparent and open to almost all seismic file types, be it Petrel, R5000, EPOS, SEG-Y, Eclipse, Geoframe and many more. Through Interica connectors, PRM provides incredible amount of information which has immense insights to a project data manager. A screenshot of PRM is shown below:
In the past few years, I have been lamenting the lesser prominence of such tools, because not many IT folks talk about them much at all. I recalled tools such as EMC Disk Extender or DX (which I worked on integrating it with the EMC Centera), and SAMFS from Sun that were sweeping the storage landscape for inactive files (pre-cursor to dark data?), putting them where they belong.
Both DX and SAMFS have gone the way of the dinosaurs but Interica Smartmove, has been enjoying a bit of uptick lately, thanks to the repositioning of the SmartMove software to address and combat the new menace of dark data.
Similarly, Veritas Data Insight is another gem to look at. It provides social networks communications, deeper insights into data risks and governance especially with the deluge of unstructured data that many organizations face today. I must admit that I am totally ignorant of Data Insight, but the solution has an interesting modernized facade to address the data landscape of any environment today.
As we speak, the phantom menace of dark data is lurking and growing. Organizations which are not disciplined to manage data diligently and intelligently will face the dark ages. Storage will be bloated, and network conduits will be choked. Just the the dark side of the Force, dark data may have already snagged the very top, and infiltrated the very core of the data landscape in an organization.
And it is the duty of this blog to remind all that there is no such thing as business as usual. We must forever be vigilant.