Greenplum looking mighty sweet

Big data is Big Business these days. IDC predicts that between 2012 and 2020, the spending on big data solution will account for 80% of IT spending and growing at 18% per annum. EMC predicts that the big data is worth USD$70 billion! That’s a very huge market.

We generate data, and plenty of it. In the IDC Digital Universe Report for 2011 (sponsored by EMC), approximately 1.8 zettabytes of data will be created and replicated in 2011. How much is 1 zettabyte, you say? Look at the conversion below:

                    1 zettabyte = 1 billion terabytes

That’s right, folks. 1 billion terabytes!

And this “mountain” of data and information is a Goldmine of goldmines, and companies around the world are scrambling to tap on this treasure chest. According to Wikibon, big data has the following characteristics:

  • Very large, distributed aggregations of loosely structured data – often incomplete and inaccessible
  • Petabytes/exabytes of data
  • Millions/billions of people
  • Billions/trillions of records
  • Loosely-structured and often distributed data
  • Flat schemas with few complex interrelationships
  • Often involving time-stamped events
  • Often made up of incomplete data
  • Often including connections between data elements that must be probabilistically inferred

But what is relevant is not the definition of big data, but rather what you get from the mountain of information generated.  The ability to “mine” the information from big data, now popularly known as Big Data Analytics, has sparked a new field within the data storage and data management industry. This is called Data Science. And companies and enterprises that are able to effectively use the new data from Big Data will win big in the next decade. Activities such as

  • Business decision making
  • Gain competitive advantage
  • Drive productivity growth in relevant industry segments
  • Understanding consumer and business behavioural patterns
  • Knowing buying decisions and business cycles
  • Yielding new innovation
  • Reveal customer insights
  • much, much more

will drive a whole new paradigm that shall be known as Data Science.

And EMC, having purchased Greenplum more than a year ago, has started their Data Computing Products Division immediately after the Greenplum acquisition. And in October of 2010, EMC announced their Greenplum Data Computing Appliance with some impressive numbers. Using 2 configurations of their appliance, noted below:

 

Below are 2 tables of the Greenplum performance benchmarks:

 

 

That’s what these big data appliance is able. The ability to load billions of either structured or unstructured files or objects in mere minutes is what drives the massive adoption of Big Data.

And a few days, EMC announced their Greenplum Unified Analytics Platform (UAP) which comprises of 3 Greenplum components:

  • A relational database for structured data
  • An enterprise Hadoop engine for the analysis and processing of unstructured data
  • Chorus 2.0, which is a social media collaboration tool for data scientists

The diagram below summarizes the UAP solution:

Greenplum is certainly ahead of the curve. Competitors like IBM Netezza, Teradata and Oracle Exalogic are racing to be ahead but Greenplum is one of the early adopters of a single platform for big data. Having a consolidation platform will not only reduce costs (integration of all big data components usually incurs high professional services’ fees) but will also reduce the barrier to entry to big data, thus further accelerating the adoption of big data.

Big Data is still very much at its infancy and EMC is pushing to establish its footprint in this space. EMC Education has already announce the general availability of courses related to big data last week and also the EMC Data Science Architect (EMC DSA) certification. Greenplum is enjoying the early sweetness of the Big Data game and there will be more to come. I am certainly looking forward to share more on this plum (pun intended ;-)) of the data storage and data management excitement.

About cfheoh

I am a technology blogger with 20+ years of IT experience. I write heavily on technologies related to storage networking and data management because that is my area of interest and expertise. I introduce technologies with the objectives to get readers to *know the facts*, and use that knowledge to cut through the marketing hypes, FUD (fear, uncertainty and doubt) and other fancy stuff. Only then, there will be progress. I am involved in SNIA (Storage Networking Industry Association) and as of October 2013, I have been appointed as SNIA South Asia & SNIA Malaysia non-voting representation to SNIA Technical Council. I was previously the Chairman of SNIA Malaysia until Dec 2012. I have recently joined Hitachi Data Systems as an Industry Manager for Oil & Gas in Asia Pacific. The position does not require me to be super-technical (which is what I love) but it helps develop another facet of my career, which is building communities and partnership. I think this is crucial and more wholesome than just being technical alone. Given my present position, I am not obligated to write about HDS and its technology, but I am indeed subjected to Social Media Guidelines of the company. Therefore, I would like to make a disclaimer that what I write is my personal opinion, and mine alone. Therefore, I am responsible for what I say and write and this statement indemnify my employer from any damages.
Tagged , , , , , , , , , , . Bookmark the permalink.

One Response to Greenplum looking mighty sweet

  1. Pingback: A little yellow elephant « Storage Gaga

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *