[Preamble: I was a delegate of Storage Field Day 15 from Mar 7-9, 2018. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]
I am a big proponent of Go-to-Market (GTM) solutions. Technology does not stand alone. It must be in an ecosystem, and in each industry, in each segment of each respective industry, every ecosystem is unique. And when we amalgamate data, the storage infrastructure technologies and the data management into the ecosystem, we reap the benefits in that ecosystem.
Data moves in the ecosystem, from system to system, north to south, east to west and vice versa, random, sequential, ad-hoc. Data acquires different statuses, different roles, different relevances in its lifecycle through the ecosystem. From it, we derive the flow, a workflow of data creating a data pipeline. The Data Pipeline concept has been around since the inception of data.
To illustrate my point, I created one for the Oil & Gas – Exploration & Production (EP) upstream some years ago.
From the diagram above, we can see different applications in each segment of this EP Data Pipeline. I try to match the storage infrastructure and data services related each segment, taking from my experiences over the years. We learn about the applications and data management pain points, and pitch the best possible technologies and operational best practices to solve the complex EP requirements and issues.
In my recent Storage Field Day 15 trip, I was re-introduced to the Data Pipeline, but in a newer setting and context. This was at NetApp and it was about addressing the AI (Artificial Intelligence) Data Pipeline. The old adage of “Old wine in new bottle”.
The digital transformation momentum has definitely pushed the envelope of change, and the technology landscape is now peppered with so many buzzwords, jargons and some whatchamacallits that have lost their meanings through crazy transmogrifications. Fellow delegate, Dr. Rachel Traylor, put up a not-too-subtle, Shania Twain-y, commentary about the overuse of industry buzzwords, unique and funny, but too true to ignore.
It is great to see organizations such as NetApp taking an interest in creating data pipeline again. I am saying this because during my stint as NetApp Malaysia’s Country Manager in 2015-2016, the data pipeline GTM solutions in a few significant industries in Asia South were brought up briefly but was shot down after a bout of executive politics. In the end the NetApp GTM solutions in Asia South never saw the light of day.
Back to Field Day, NetApp’s Data Pipeline was a edge-to-core-cloud approach. This made sense, as the technologies spread beyond the data centers, the cloud providers to the edge and end-devices, IoT ((Internet of Things) and serverless, Lambda-like computing (more buzzwords) would bring far greater data points. This would require great scaling, resiliency and performance unlike anything we have seen at present. The data growth could be unsustainable if data is not managed well.
In this NetApp curated AI Data Architecture and Data Lifecycle, the scale goes beyond the confines of the data center. And the “Data Mobility, Data Locality” mantra will apply. The data moves, orchestrated with workflow, pipelines and automation to get the maximum benefits, unhindered and unhinged from the shackles of data silos and poor data management practices. And it will be done through secure data policies and compliant to regulatory requirements and digital borders of data sovereignty.
I was particularly gravitated to Santosh Rao’s blog – Choosing an Optimal Filesystem and Data Architecture for Your AI/ML/DL Pipeline, because it addressed the important relationship of data, the behaviour of data, and the workflow (pipeline) in the AI data architecture and data lifecycle.
Of course, where there is honey, there are bees. Another one of my ex-employers, Hitachi Vantara has been chugging along with their own data pipelines for machine learning and deep learning (ML/DL). In a recent article aptly titled “Data Pipeline need love too“, Computer Weekly pointed out the Hitachi Vantara’s advantage to their version of Data Pipelining.
The pipeline concept is also very evident in the content creation industry. I did some work there. I sold a few ZFS storage technology and got to learn about their production pipeline as well. I am not a deep practitioner here but here is a simplistic overview of the content creation pipeline.
In my books, it is critical to own the Data Pipeline. Storage technologists and data management professionals move themselves up the value chain, and speak the lingo of each segment unique to that industry. Combining the experiences of both sides of the data pipeline, storage infrastructure and data management on one side and the proprietary applications and operational challenges on the other, will undoubtedly a deeper value than just being technology pushers.
So, altogether now, can we say Data Pipelines?