Rethinking data processing frameworks systems in real time

“Row, row, row your boat, gently down the stream…”

Except the stream isn’t gentle at all in the data processing’s new context.

For many of us in the storage infrastructure and data management world, the well known framework is storing and retrieve data from a storage media. That media could be a disk-based storage array, a tape, or some cloud storage where the storage media is abstracted from the users and the applications. The model of post processing the data after the data has safely and persistently stored on that media is a well understood and a mature one. Users, applications and workloads (A&W) process this data in its resting phase, retrieve it, work on it, and write it back to the resting phase again.

There is another model of data processing that has been bubbling over the years and now reaching a boiling point. Still it has not reached its apex yet. This is processing the data in flight, while it is still flowing as it passes through processing engine. The nature of this kind of data is described in one 2018 conference I chanced upon a year ago.

letgo marketplace processing numbers in 2018

  • * NRT = near real time

From a storage technology infrastructure perspective, this kind of data processing piqued my curiosity immensely. And I have been studying this burgeoning new data processing model in my spare time, and where it fits, bringing the understanding back into the storage infrastructure and data management side.

Continue reading

Where are your files living now?

[ This is Part One of a longer conversation ]

EMC2 (before the Dell® acquisition) in the 2000s had a tagline called “Where Information Lives™**. This was before the time of cloud storage. The tagline was an adage of enterprise data storage, proper and contemporaneous to the persistent narrative at the time – Data Consolidation. Within the data consolidation stories, thousands of files and folders moved about the networks of the organizations, from servers to clients, clients to servers. NAS (Network Attached Storage) was, and still is the work horse of many, many organizations.

[ **Side story ] There was an internal anti-EMC joke within NetApp® called “Information has a new address”.

EMC tagline “Where Information Lives”

This was a time where there were almost no concerns about Shadow IT; ransomware were less known; and most importantly, almost everyone knew where their files and folders were, more or less (except in Oil & Gas upstream – to be told in later in this blog). That was because there were concerted attempts to consolidate data, and inadvertently files and folders, in the organization.

Even when these organizations were spread across the world, there were distributed file technologies at the time that could deliver files and folders in an acceptable manner. Definitely not as good as what we have today in a cloudy world, but acceptable. I personally worked a project setting up Andrew File Systems for Intel® in Penang in the mid-90s, almost joined Tacit Networks in the mid-2000s, dabbled on Microsoft® Distributed File System with NetApp® and Windows File Servers while fixing the mountains of issues in deploying the worldwide GUSto (Global Unified Storage) Project in Shell 2006. Somewhere in my chronological listings, Acopia Networks (acquired by F5) and of course, EMC2 Rainfinity and NetApp® NuView OEM, Virtual File Manager.

The point I am trying to make here is most IT organizations had a good grip of where the files and folders were. I do not think this is very true anymore. Do you know where your files and folders are living today? 

Continue reading

The Heart of Digital Transformation is …

Businesses have taken up Digital Transformation in different ways and at different pace. In Malaysia, company boardrooms are accepting Digital Transformation as a core strategic initiative, crucial to develop competitive advantage in their respective industries. Time and time again, we are reminded that Data is the lifeblood and Data fuels the Digital Transformation initiatives.

The rise of CDOs

In line with the rise of the Digital Transformation buzzword, I have seen several unique job titles coming up since a few years ago. Among those titles, “Chief Digital Officer“, “Chief Data Officer“, “Chief Experience Officer” are some eye-catching ones. I have met a few of them, and so far, those I met were outward facing, customer facing. In most of my conversations with them respectively, they projected a front that their organization, their business and operations have been digital transformed. They are ready to help their customers to transform. Are they?

Tech vendors add more fuel

The technology vendors have an agenda to sell their solutions and their services. They paint aesthetically pleasing stories of how their solutions and wares can digitally transform any organizations, and customers latch on to these ‘shiny’ tech. End users get too fixated that technology is the core of Digital Transformation. They are wrong.

Missing the Forest

As I gather more insights through observations, and more conversations and more experiences, I think most of the “digital transformation ready” organizations are not adopting the right approach to Digital Transformation.

Digital Transformation is not tactical. It is not a one-time, big bang action that shifts from not-digitally-transformed to digitally-transformed in a moment. It is not a sprint. It is a marathon. It is a journey that will take time to mature. IDC and its Digital Transformation MaturityScape Framework is spot-on when they first released the framework years ago.

IDC Digital Transformation Maturityscape

Continue reading

We got to keep more data

Guess which airport has won the most awards in the annual Skytrax list? Guess which airport won 480 awards since its opening in 1981? Guess how this airport did it?

Data Analytics gives the competive edge.

Serving and servicing more than 65 million passengers and travellers in 2018, and growing, Changi Airport Singapore sets a very high level customer service. And it does it with the help of technology, something they call Smart (Service Management through Analytics and Resource Transformation) Airport. In an ultra competitive and cut-throat airline business, the deep integration of customer-centric services and the ultimate traveller’s experience are crucial to the survival and growth of airlines. And it has definitely helped Singapore Airlines to be the world’s best airlines in 2018, its 4th win.

To achieve that, Changi Airport relies on technology and lots of relevant data for deep insights on how to serve its customers better. The details are well described in this old news article.

Keep More Relevant Data for Greater Insights

When I mean more data, I do not mean every single piece of data. Data has to be relevant to be useful.

How do we get more insights? How can we teach systems to learn? How to we develop artificial intelligence systems? By having more relevant data feeding into data analytics systems, machine learning and such.

As such, a simple framework for building from the data ingestion, to data repositories to outcomes such as artificial intelligence, predictive and recommendations systems, automation and new data insights isn’t difficult to understand. The diagram below is a high level overview of what I work with most of the time. Continue reading

Commvault UDI – a new CPUU

[Preamble: I am a delegate of Storage Field Day 14. My expenses, travel and accommodation are paid for by GestaltIT, the organizer and I am not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]

I am here at the Commvault GO 2017. Bob Hammer, Commvault’s CEO is on stage right now. He shares his wisdom and the message is clear. IT to DT. IT to DT? Yes, Information Technology to Data Technology. It is all about the DATA.

The data landscape has changed. The cloud has changed everything. And data is everywhere. This omnipresence of data presents new complexity and new challenges. It is great to get Commvault acknowledging and accepting this change and the challenges that come along with it, and introducing their HyperScale technology and their secret sauce – Universal Dynamic Index.

Continue reading

Greenplum looking mighty sweet

Big data is Big Business these days. IDC predicts that between 2012 and 2020, the spending on big data solution will account for 80% of IT spending and growing at 18% per annum. EMC predicts that the big data is worth USD$70 billion! That’s a very huge market.

We generate data, and plenty of it. In the IDC Digital Universe Report for 2011 (sponsored by EMC), approximately 1.8 zettabytes of data will be created and replicated in 2011. How much is 1 zettabyte, you say? Look at the conversion below:

                    1 zettabyte = 1 billion terabytes

That’s right, folks. 1 billion terabytes!

And this “mountain” of data and information is a Goldmine of goldmines, and companies around the world are scrambling to tap on this treasure chest. According to Wikibon, big data has the following characteristics:

  • Very large, distributed aggregations of loosely structured data – often incomplete and inaccessible
  • Petabytes/exabytes of data
  • Millions/billions of people
  • Billions/trillions of records
  • Loosely-structured and often distributed data
  • Flat schemas with few complex interrelationships
  • Often involving time-stamped events
  • Often made up of incomplete data
  • Often including connections between data elements that must be probabilistically inferred

But what is relevant is not the definition of big data, but rather what you get from the mountain of information generated.  The ability to “mine” the information from big data, now popularly known as Big Data Analytics, has sparked a new field within the data storage and data management industry. This is called Data Science. And companies and enterprises that are able to effectively use the new data from Big Data will win big in the next decade. Activities such as

  • Business decision making
  • Gain competitive advantage
  • Drive productivity growth in relevant industry segments
  • Understanding consumer and business behavioural patterns
  • Knowing buying decisions and business cycles
  • Yielding new innovation
  • Reveal customer insights
  • much, much more

will drive a whole new paradigm that shall be known as Data Science.

And EMC, having purchased Greenplum more than a year ago, has started their Data Computing Products Division immediately after the Greenplum acquisition. And in October of 2010, EMC announced their Greenplum Data Computing Appliance with some impressive numbers. Using 2 configurations of their appliance, noted below:

 

Below are 2 tables of the Greenplum performance benchmarks:

 

 

That’s what these big data appliance is able. The ability to load billions of either structured or unstructured files or objects in mere minutes is what drives the massive adoption of Big Data.

And a few days, EMC announced their Greenplum Unified Analytics Platform (UAP) which comprises of 3 Greenplum components:

  • A relational database for structured data
  • An enterprise Hadoop engine for the analysis and processing of unstructured data
  • Chorus 2.0, which is a social media collaboration tool for data scientists

The diagram below summarizes the UAP solution:

Greenplum is certainly ahead of the curve. Competitors like IBM Netezza, Teradata and Oracle Exalogic are racing to be ahead but Greenplum is one of the early adopters of a single platform for big data. Having a consolidation platform will not only reduce costs (integration of all big data components usually incurs high professional services’ fees) but will also reduce the barrier to entry to big data, thus further accelerating the adoption of big data.

Big Data is still very much at its infancy and EMC is pushing to establish its footprint in this space. EMC Education has already announce the general availability of courses related to big data last week and also the EMC Data Science Architect (EMC DSA) certification. Greenplum is enjoying the early sweetness of the Big Data game and there will be more to come. I am certainly looking forward to share more on this plum (pun intended ;-)) of the data storage and data management excitement.