SSOT of Files

[ This is part two of “Where are your files living now?”. You can read Part One here ]

“Data locality, Data mobility“. It was a term I like to use a lot when describing about data consolidation, leading to my mention about files and folders, and where they live in my previous blog. The thinking of where the files and folders are now as in everywhere as they can be in a plethora of premises stretches the premise of SSOT (Single Source of Truth). And this expatriation of files with minimal checks and balances disturbs me.

A year ago, just before I joined iXsystems, I was given Google® embargoed news, probably a week before they announced BigQuery Omni. Then I was interviewed by Enterprise IT News, a local Malaysian technology news portal to provide an opinion quote. This was what I quoted:

“’The data warehouse in the cloud’ managed services of Big Query is underpinned by Google® Anthos, its hybrid cloud infra and service management platform based on GKE (Google® Kubernetes Engine). The containerised applications, both on-prem and in the multi-clouds, would allow Anthos to secure and orchestrate infra, services and policy management under one roof.”

I further quoted ” The data repositories remain in each cloud is good to address data sovereignty, data security concerns but it did not mention how it addresses “single source of truth” across multi-clouds.”

Single Source of Truth – regardless of repositories

“The need to remove duplicates of data, sorting, merging, filtering can be challenging, and unless and until there are ways or tools to ensure that data handling and enablement across multi-clouds, the thoughts of data silos remain. At particular data points, data from different data repositories has to joined to be relevant. This was not mentioned clearly, or perhaps a deeper dive is need to understand more.”

I was not sure if my comments have deflated Google®’s exuberant announcement, but I hope I have planted some reality to “marketing hype” that technology companies like to do to ever so often. These fluffy news thingies often disturbed me.

Consolidation, not centralization

Consolidation does not mean centralization. We cannot centralized all data into one premises. We cannot stop the diaspora of files and folders to everywhere. In cloud speak, objects and buckets are permeating into the information management systems, but files and folders remain the dominant data structures that are the most human interfaces when interacting with data.

To work with this widespread of files and folders in public, private clouds, mobile devices, edge devices and so on, we must think of data locality, and data mobility models because their presence, whether local or mobile or remote represents speed to access, consolidation to protect and secure, costs to deliver, and services to procure their accesses. The “data locality, data mobility” mindset must be cultivated and honed in a solution or a data architect when it comes to designing an information system that will involve storage infrastructure, storage repositories, the workflows and pipelines of data, and the lifecycle of how the data is used.

USD$100 million error

We often hear the concept “Data is the new Oil”. This adage is even more acute in Oil & Gas exploration and production (E&P), because data, only data, is used to seek Earth’s most valuable energy resources in the subsurface. It is data that helps the G&G folks to find oil and gas, and this mentality has been part of the industry for over a century.

Data in Oil & Gas Exploration and Production

Oil and Gas drilling is expensive. A typical rig, from its initial setup to an operational drilling stage could easily take up USD$20-25 million cost on land, and twice as much offshore. If the rig was in deep waters and ultra deep waters, that costs of the rig could be as much as 3x.

That is why it is absolutely critical to make sure that data in the files is accurate and relevant. When you have the wrong and irrelevant information in the subsurface files, bad things happened. There was a hearsay that one Oil & Gas venture erred in the rig placement and operations in the Gulf of Thailand many years ago. The rig cost them USD$25 million in the initial drill. They erred 4 times because of wrong information. That was a very costly USD$100 million error.

DIKW

In information systems design, there is an organizational data framework called DIKW. It stands for Data, Information, Knowledge, Wisdom. It is often represented in the form of a pyramid.

Going deeper, DIKW is a set of methodologies that advance data through tiers, where each tier defines the data and increases the value of the data through the organization. The context of this data is attended to, cultivated, refined, preserved and even destroyed throughout its lifecycle within the organization. This practice, over time, would create the data-driven culture that many organizations try to espouse.

Data Information Knowledge Wisdom Pyramid

I have written several LinkedIn articles in the past on the subject. You can read them here:

[ January 2015 ] Data, Information, Knowledge, Wisdom
[ October 2014 ] E&P Data Management issues are universal
[ August 2014 ] Going on cheap for data management? You shouldn’t.
[ November 2014 ] EP Data Management Loud and Proud
[ August 2016 ] Great Data Analytics comes from great Data Management

However, with the advent of cloud computing, edge computing, mobile computing, the extensive spread of files and folders and data globally, I wonder how driven and enterprising these millennial ventures (including established institutions leaping in in the name of digital transformation) are in developing a DIKW culture in handling and developing data through their organizations.

History is a great teacher

“Those who fail to learn from history are doomed to repeat it” – Winston Churchill

When I wrote “Where are your files living now? last week, the objectives were to highlight the ease and convenience of placing our files and folders everywhere. They might exist as some application files, or some image files or JSON or XML files or just about everything else. But underneath that good comfort that we can get our files and folders anytime, anywhere, any device – a fashionable buzzword much often touted by the storage technology and cloud storage vendors -, there is a duty for solution architects, systems engineers, data analysts, and application developers to think about the credos of SSOT and DIKW in data and file management. And with the credos, we must ponder where and how the files and folders are placed and used, whether they are local, remote or mobile.

The Oil & Gas subsurface industry has stood for more than a century, relying almost entirely on data kept in files and folders (digital and physical). The ups and downs of the industry have continued to expand history lessons that can be learned by other industries in health & life sciences, media content archiving and preservation, and many more.

Yes, what I have written here about data and files may be archaic, boring, and not in concord with the millennial-type applications like DeFi, NFTs (non-fungible tokens), blockchain, IOT, data science, AI, etc., etc., blah, blah, blah. But the (history) lesson here is the philosophy of data management and file management does not change.

With that in mind, I hope more storage technology services and infrastructure vendors do their part and design their offerings that will lead to the greater latitude of data consolidation, which eventually leads to distinguished information management. Single Source of Truth (SSOT) matters!