A little yellow elephant

By now, I believe most of you in the storage networking world would have heard of Hadoop. Hadoop was created by Doug Cutting, while he and his team was working on an open source web search engine called Nutch. The easily recognized little yellow elephant, Hadoop, was Doug Cutting’s son toy, which he made as Hadoop’s mascot. Pretty cool!

And today, Hadoop has become THE platform for Big Data applications. Why?

As I have mentioned before, everything that we do or don’t do, generates data, either as a direct product or in-direct product. I am blogging right now and I am creating data. I was in Singapore the whole of this week and everywhere I go in the MRT stations, I am being watched by the video cameras they have at the station. A new friend in class said that Singapore is the second most “watched” city after London, where there are video cameras mounted everywhere, either discreetly or indiscreetly. And that’s just video data. And there’s plenty of other human activities that generate tons and tons of data.

IDC Digital Universe Report for 2011 said that we have generated 1.8ZB (zettabyte) of data this year alone. I mentioned in my previous blog that this is a gold mine and companies are scrambling to tap on massive amount of data.  Extracting valuable information to anticipate the next trend or predict that next evolution in human preference is akin to the Gold Rush in the wild, wild west in the late 19th century. Folks, Big Data is going to be this generation’s “Digital Gold Rush”.

Sieving, filtering and processing gazillions of data (more unstructured than structured) will not work in defined, well-formatted relational databases. The data model of relational databases will simply break down. And of course, there are different schools of thoughts of different data models, but the Hadoop model seems to be gaining momentum and mind share of data scientists. That is because of Hadoop’s capability to deal with massive unstructured data, processing it and producing results in a small amount of time.

One way to process the pool of massive data is parallel programming. In parallel programming, multi-threading is commonly deployed to achieve the performance and effects of programming. But implementing multi-threading in parallel programming is difficult. Developers often has to deal with LWP (lightweight processes), semaphores, shared memory, mutex (mutually exclusive) locking and so on. Hence this style of programming works with different states on shared data, often resulting in different results in different states, even when using the same programming expression.

Hadoop belongs to another school of programming known as functional programming, where the different states on shared data concept is removed. With that in mind, the dependency on different states is also removed, resulting in a much easier and simpler parallel programming implementation. Hadoop borrows ideas from the MapReduce software framework made well known by Google and the Google File System.

Before, we get to know Hadoop, we must know MapReduce. MapReduce is a framework which allows very large data sets to be processed with a very large set of computer nodes in a cluster. Typically the computational processing is executed in a distributed fashion, spread across many computer nodes and final results are consolidated from the sub-results of these distributed processing nodes.

According to Wikipedia, the 2 key functions of Map Reduce are map() and reduce(). That’s pretty obvious. The extract below was taken from the Wikipedia definition, and explains both functions very well.

“Map” step: The master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.

“Reduce” step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.

The diagram below probably can simplify the concept of MapReduce to the readers.

 

Hadoop is one of the open-source implementations of MapReduce. It is one of the projects of Apache Foundation, and the project has sparked a brand-new niche of data search, data management and data science. The diagram below will allow our readers to juxtapose MapReduce and Hadoop, and comparing them in the simplest fashion.

Hadoop primary development platform is Java. Hadoop’s architecture consists mainly of 2 components – Hadoop Common and a Hadoop-compatible file system, as shown in the diagram below.

Hadoop MapReduce layer above is the file/object access interface to the Hadoop-compatible file system below. HDFS is Hadoop Distributed File System is just one of a few Hadoop-compatible file systems. Other file systems include:

  • Amazon S3 File System as part of the Amazon EC2 Infrastructure-as-a-Service (IaaS) cloud platform
  • CloudStore – a similar Hadoop-like implementation using C++ and also inspired by Google File System
  • FTP file systems
  • HTTP and HTTPS read-only file systems
  • Any file systems accessible with the file:// URL nomenclature

But the main engine of Hadoop is in the MapReduce layer. The 2 core components in this layer is JobTracker and TaskTracker. Both has their own individual roles to play and collectively, they are key cogs in the Hadoop distributed data processing model.

Below are extract I picked up from Wikipedia.

JobTracker submits MapReduce jobs to client applications. The JobTracker pushes work out to available TaskTracker nodes in the cluster, striving to keep the work as close to the data as possible. With a rack-aware filesystem, the JobTracker knows which node contains the data, and which other machines are nearby. If the work cannot be hosted on the actual node where the data resides, priority is given to nodes in the same rack. This reduces network traffic on the main backbone network. If a TaskTracker fails or times out, that part of the job is rescheduled. The TaskTracker on each node spawns off a separate Java Virtual Machine process to prevent the TaskTracker itself from failing if the running job crashes the JVM. A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check its status. The Job Tracker and TaskTracker status and information is exposed by Jetty and can be viewed from a web browser. Jetty is a Java-based HTTP server, among other things

JobTracker records what it is up to in the filesystem. When a JobTracker starts up, it looks for any such data, so that it can restart work from where it left off.

Scheduling

By default Hadoop uses first-in, first-out (FIFO), and optional 5 scheduling priorities to schedule jobs from a work queue. In version 0.19 the job scheduler was refactored out of the JobTracker, and added the ability to use an alternate scheduler (such as the Fair scheduler or the Capacity scheduler).

Fair scheduler

The fair scheduler was developed by Facebook. The goal of the fair scheduler is to provide fast response times for small jobs and QoS (Quality of Service) for production jobs. The fair scheduler has three basic concepts.

  1. Jobs are grouped into Pools.
  2. Each pool is assigned a guaranteed minimum share.
  3. Excess capacity is split between jobs.

By default jobs that are uncategorized go into a default pool. Pools have to specify the minimum number of map slots, reduce slots, and a limit on the number of running jobs.

Capacity scheduler

The capacity scheduler was developed by Yahoo. The capacity scheduler supports several features which are similar to the fair scheduler.

  • Jobs are submitted into queues.
  • Queues are allocated a fraction of the total resource capacity.
  • Free resources are allocated to queues beyond their total capacity.
  • Within a queue a job with a high level of priority will have access to the queue’s resources.

I took most the extract below from Wikipedia, and I don’t claim to be a knowledgeable person on Hadoop. All the credits go to Wikipedia editors to put Hadoop in layman terms.

Hadoop has certainly won the hearts of the new digital gold rush, Big Data and is slowly becoming a force to be reckoned with among data scientists. Hadoop implementations are powering new frontiers in processing and mining the ever growing data capacity, giving solution providers a simple programming methodology and data model to gain more insights into the vast seas of data and information.

Hadoop has many fans, and slowly becoming the data platform for large companies such as Yahoo!, Facebook, IBM, Amazon, Apple, eBay and many more. Facebook even claims to have the largest Hadoop clusters in the world, growing to 30PB in July of 2011.

This little yellow elephant is going places and one to watch out for.

Greenplum looking mighty sweet

Big data is Big Business these days. IDC predicts that between 2012 and 2020, the spending on big data solution will account for 80% of IT spending and growing at 18% per annum. EMC predicts that the big data is worth USD$70 billion! That’s a very huge market.

We generate data, and plenty of it. In the IDC Digital Universe Report for 2011 (sponsored by EMC), approximately 1.8 zettabytes of data will be created and replicated in 2011. How much is 1 zettabyte, you say? Look at the conversion below:

                    1 zettabyte = 1 billion terabytes

That’s right, folks. 1 billion terabytes!

And this “mountain” of data and information is a Goldmine of goldmines, and companies around the world are scrambling to tap on this treasure chest. According to Wikibon, big data has the following characteristics:

  • Very large, distributed aggregations of loosely structured data – often incomplete and inaccessible
  • Petabytes/exabytes of data
  • Millions/billions of people
  • Billions/trillions of records
  • Loosely-structured and often distributed data
  • Flat schemas with few complex interrelationships
  • Often involving time-stamped events
  • Often made up of incomplete data
  • Often including connections between data elements that must be probabilistically inferred

But what is relevant is not the definition of big data, but rather what you get from the mountain of information generated.  The ability to “mine” the information from big data, now popularly known as Big Data Analytics, has sparked a new field within the data storage and data management industry. This is called Data Science. And companies and enterprises that are able to effectively use the new data from Big Data will win big in the next decade. Activities such as

  • Business decision making
  • Gain competitive advantage
  • Drive productivity growth in relevant industry segments
  • Understanding consumer and business behavioural patterns
  • Knowing buying decisions and business cycles
  • Yielding new innovation
  • Reveal customer insights
  • much, much more

will drive a whole new paradigm that shall be known as Data Science.

And EMC, having purchased Greenplum more than a year ago, has started their Data Computing Products Division immediately after the Greenplum acquisition. And in October of 2010, EMC announced their Greenplum Data Computing Appliance with some impressive numbers. Using 2 configurations of their appliance, noted below:

 

Below are 2 tables of the Greenplum performance benchmarks:

 

 

That’s what these big data appliance is able. The ability to load billions of either structured or unstructured files or objects in mere minutes is what drives the massive adoption of Big Data.

And a few days, EMC announced their Greenplum Unified Analytics Platform (UAP) which comprises of 3 Greenplum components:

  • A relational database for structured data
  • An enterprise Hadoop engine for the analysis and processing of unstructured data
  • Chorus 2.0, which is a social media collaboration tool for data scientists

The diagram below summarizes the UAP solution:

Greenplum is certainly ahead of the curve. Competitors like IBM Netezza, Teradata and Oracle Exalogic are racing to be ahead but Greenplum is one of the early adopters of a single platform for big data. Having a consolidation platform will not only reduce costs (integration of all big data components usually incurs high professional services’ fees) but will also reduce the barrier to entry to big data, thus further accelerating the adoption of big data.

Big Data is still very much at its infancy and EMC is pushing to establish its footprint in this space. EMC Education has already announce the general availability of courses related to big data last week and also the EMC Data Science Architect (EMC DSA) certification. Greenplum is enjoying the early sweetness of the Big Data game and there will be more to come. I am certainly looking forward to share more on this plum (pun intended ;-)) of the data storage and data management excitement.

“Ugly Yellow Box” bought by private equity firm

Security is BIG business, probably even bigger than storage and with more “sex” appeal and pazzazz! My friends are owners of 2 of the biggest security distributors in town, so I know. I am not much of a security guy, but I reason I write about Bluecoat is that this company has something close to my heart.

In the early 2000, NetApp used to have a separate division that is not storage. They have a product called NetCache, which is a web proxy solution. It was a pretty decent product and one of the competitors we frequently encounter on the field was an “ugly yellow box” called CacheFlow. Whenever we see an “ugly yellow box” in a rack, we will immediately know that it was a CacheFlow box. NetApp competed strongly with Cache Flow, partly because their CEO and founder, Brian NeSmith, as we NetAppians were told, was ex-NetApp. And there was some animosity between Brian and NetApp, up to the point that I recalled NetApp’s CEO then, Dan Warmenhoven, declaring that “NetApp will bury CacheFlow!“, or something of that nature. At that point, in the circa of 2001-2002, CacheFlow was indeed in a bit of a rut as well. They suffered heavy losses and was near bankcruptcy. A old news from Forbes confirmed Brian NeSmith’s near-bankcruptcy adventure.

 

CacheFlow survived the rut, changed their name to Bluecoat Systems, and changed their focus from Internet caching to security. Know why they are know as “Bluecoat”? They are the policemen of the Internet, and policemen are men in blue coats. I found an old article from Network World about their change.  And they decided not to paint their boxes yellow anymore. 😉

 

Eventually, it was CacheFlow who triumphed over NetApp. And the irony was NetApp eventually sold the NetCache unit and its technology to BlueCoat in 2006. And hence, that my account of the history of Bluecoat.

Yesterday, Bluecoat was on the history books again, but for a better reason. A private equity firm, Thoma Bravo, has put in USD$1.3 billion to acquire Bluecoat. News here and here.

Have a happy Sunday 😀

Gartner 3Q2011 WW ECB Disk Storage Market

Just after IDC released their numbers of their worldwide Disk Storage System Tracker (Read my blog) 10 days ago, Gartner released their Worldwide External Controller Based (ECB) Disk Storage Market report for Q3 of 2011.

The storage market remains resilient (for now) and growing 10.4% in terms of revenue, despite the hard economic conditions. The table below shows the top 7 storage vendors and their relation to their Q2 numbers.

 

EMC remained at the top and gained a massive 3.6% jump in market share. Looks like they are firing all cylinders and chugging like an unstoppable steam train. IBM gained 0.1% in second place as its stable of DS8000, XIV and Storewize V7000 is taking shape. Even though IBM has been holding steadily, I still think that their present storage lineup is staggered and lacks that seamless upgrade path for their customers.

NetApp, which I always terms as the “little engine that could”, is slowing down. They were badly hit in the last quarter, delivering lower than expected revenue numbers according to the analysts. Their stock took a tumble too. As quoted by Gartner, “NetApp’s third-quarter results reflect an overdependence on a few large customers, limited geographic coverage in high-growth countries and increased competition from Dell, EMC, HP and IBM in the midrange modular ECB disk array market segment.

I wrote in my recent blog, that NetApp has to start evolving from a pure-play storage vendor into a total storage and data management solution vendor. The recent rumours of NetApp’s interests in Commvault and Quantum should make a lot of sense if NetApp decides to make that move. Come on, NetApp! What are you waiting for?

HP came back strong in this report. They are in 4th place with 10.4% market share and hot on NetApp’s heels. After many months of nonsensical madness – Leo Apotheker firing, trying to ditch the PC business, the killing of WebOS tablet, the very public Oracle-HP spat – things are beginning to settle a bit under their new CEO, Meg Whitman. In a recent HP Discover conference in Vienna, it was reported that the HP storage team is gung-ho of what they have in their arsenal right now. They called it “The 4 Jewels of HP Storage Crown” which includes 3PAR, Ibrix, StoreOnce and LeftHand. They also leap-frogged over HDS and Dell in the recent Gartner Magic Quadrant (See below).

Kudos to HP and team.

HDS seems to be doing well, and so is Dell. But the Gartner numbers tell a different story. HDS, lost market share and now shares 7.8% market share with Dell. Dell, despite its strong marketing on Compellent, could not make up its loss after breaking off with EMC.

Fujitsu and Oracle completes the line up.

My conclusion: HP and IBM are coming back; EMC is well and far ahead of everyone else; NetApp has to evolve; Dell still lacking in enterprise storage savviness despite having good technology; No comments about HDS. 

Cloud Computing and it’s not iCloud

Steve Jobs was great with what he has done, but when it comes to Cloud Computing, Jeff Bezos of Amazon is the one. And I believe the Amazon Web Services (AWS) is bigger than Apple’s iCloud, in this present time and the future. Why do I say that knowing that the Apple fan boys could be using me as target practice? Because I believe what Amazon is doing is the future of Cloud Computing. Jeff Bezos is a true visionary.

One thing we have to note is that we play different roles when it comes to Cloud Computing. There are Cloud Service Providers (CSP) and there are enterprise subscribers. On a personal level, there are CSPs that cater for consumer-level type of services and there are subscribers of this kind as well. The diagram below shows the needs from an enterprise perspective, for both providers and subscribers.

 

Also we recognize Amazon from a less enterprise perspective, and they are probably better known for their engagement at the consumer level. But what Amazon is brewing could already be what Cloud Computing should be and I don’t think Apple iCloud is quite there yet.

Amazon Web Services cater for the enterprise and the IT crowd, providing both Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service (IaaS) through its delectable offerings of the

  • Elastic Compute Cloud (EC2)
  • SimpleDB
  • Simple Storage Service (S3)
  • Elastic Block Store (EBS)
  • Elastic Beanstalk
  • CloudFormation
  • many more

And AWS has been operational and serving enterprise customers for 5-6 years now. Netflix, Zynga, Farmville are some of AWS customers.  This is something Apple iCloud do not have, a Cloud Computing ecosystems for enterprise customers. Apple iCloud do not offer PaaS or IaaS. Perhaps that’s Apple vision not to get into the enterprise, but eventually the world evolve around businesses and businesses are adopting Cloud Computing. Many readers may disagree with what I say now in this paragraph but I will share with you later that even at the consumer level, Amazon is putting right moves in place, probably more so than Apple’s vision. (more about this later).

But the recent announcement of Kindle Fire, their USD$199 Android-based gadget, was to me, the final piece to Amazon’s Phase I jigsaw – the move to conquer the Cloud Computing space. I read somewhere that USD$199 Kindle Fire actually costs about USD$201.XX to manufacture. Apple’s iPad costs USD$499. So Amazon is making a loss for each gadget they sell. So what! It’s no big deal.

Let me share with you this table that will rattle your thinking a little bit. Remember this: Cloud Computing is defined as a “utility”. Cloud Computing is about services, content. 

The table was taken from a recent Wired Magazine article. It featured the interview with Jeff Bezos. Go check out the interview. It’s very refreshing and humbling.

I hope the table is convincing you enough to say that the device or the gadget doesn’t matter. Yes, Apple and Amazon have different visions when it comes to Cloud Computing, but if you take some time to analyze the comparison, Amazon does not lock you into buying expensive (but very good) hardware, unlike Apple.

Take for instance the last point. Apple promotes downloaded media while Amazon uses streamed media. If you think about it, that what Cloud Computing should be because the services and the contents are utility. Amazon is providing services and content as a utility. Apple’s thinking is more old-school, still very much the PC-era type of mentality. You have to download the applications onto your gadget before you can use it.

Even the Amazon Silk browser concept is more revolutionary that Apple’s Safari. The Silk browser splits some of the processing in the Amazon Cloud, taking advantage of the power of the Amazon Cloud to do the processing for the user. Here’s a little video about Amazon Silk browser.

The Apple Safari is still very PC-centric, where most of the Web content has to be downloaded onto the browser to be viewed and processed. No doubt the Amazon Silk also download contents, but some of the processing such as read-ahead, applet-processing functions have been moved to Amazon Cloud. That’s changing our paradigm. That’s Cloud Computing. And iCloud does not have anything like that yet.

Someone once told me that Cloud is about economics. How incredibly true! It is about having the lowest costs to both providers and consumers. It’s about bringing a motherload of contents  that can be delivered to you on the network. Amazon has tons of digital books, music, movies, TV and computing power to sell to you. And they are doing it at a responsible pace, with low margins. With low margins, the barrier of entry is lower, which in turn accelerates the Cloud Computing adoption. And Amazon is very good at that. Heck, they are selling their Kindle Fire at a loss.

Jeff Bezos has stressed that what they are doing is long term, much longer term than most. To me, Jeff Bezos is the better visionary of Cloud Computing. I am sorry but the reality is Steve Jobs wants high margins from the gadgets they sell to you. That is Apple’s vision for you.

 Photo courtesy of Wired magazine.

Storage must go on a diet

Nowadays, the capacity of the hard disk drives (HDDs) are really big. 3TB is out and 4TB is in the horizon. What’s next?

For small-medium businesses in Malaysia, depending on their data requirements and applications, 3-10TB is pretty sufficient  and with room to grow as well. Therefore, a 6TB requirement can be easily satisfied with 2 x 3TB HDDs.

If I were the customer, why would I buy a storage array, with the software licenses and other stuff that will not only increase my cost of equipment acquisition and data management, it will also increase the complexity of my IT infrastructure? I could just slot HDDs into my existing server, RAID it with RAID-0 (not a good idea but to save costs, most customers would do that) and I have a 6TB volume! It’s cheaper, easier to manage with Windows or Linux, and my system administrator doesn’t have to fuss about lack of storage experience.

And RAID isn’t really keeping up with the tremendous growth of HDD’s capacity as well. In fact, RAID is at risk. RAID (especially RAID 5/6) just cannot continue provide the LUN or volume reliability and data availability because it just takes too damn long to rebuild the volume after the failure of a disk.

Back in the days where HDDs were less than 500GB, RAID-5 would still hold up but after passing the 1TB mark, RAID-6 became more prevalent. But now, that 1TB has ballooned to 3TB and RAID-6 is on shaky ground. What’s next? RAID-7? ZFS has RAID-Z3, triple parity but come on, how many vendors have that? With triple parity or stronger RAID (is there one?), the price of the storage array is going to get too costly.

Experts have been speaking about parity-declustering,  but that’s something that a few vendors have right now. Panasas, founded by one of forefathers of RAID, Garth Gibson, comes to mind. In fact, Garth Gibson and Mark Holland of Cargenie-Mellon University’s Parallel Data Lab (PDL) presented a paper about parity-declustering more than 10 years ago.

Let’s get back to our storage fatty. Yes, our storage is getting fat, obese, rotund or whatever you want to call it. And storage vendors have been pushing a concept in hope that storage administrators and customers can take advantage of it. It is called Storage Optimization or Storage Efficiency.

Here are a few ways you can consider to put your storage on a diet.

  • Compression
  • Thin Provisioning
  • Deduplication
  • Storage Tiering
  • Tapes and SSDs

To me, compression has not taken the storage world by storm. But then again, there aren’t many vendors that tout compression as a feature for storage optimization. Most of them rather prefer to push the darling of data reduction, data deduplication, as the main feature for save more space. Theoretically, data deduplication makes more sense when the data is inactive, and has high occurrence of duplicated data. That is why secondary storage such  as backup deduplication targets like Data Domain, HP StoreOnce, Quantum DXi can publish 20:1 rates and over time, that rate can get even higher.

NetApp also has been pushing their A-SIS data deduplication on primary storage. Yes, it helps with the storage savings in primary but when the need for higher data transfer rates and time to access “manipulated” data (deduped or compressed), it is likely that compression is a better choice for primary, active data.

So who has compression? NetApp ONTAP 8.0.1 has compression now and IBM with its Storewize V7000 started as a compression device. Read about IBM Storewize in my blog here. Dell has Ocarina Networks, which was recently unleashed. I am a big fan of Ocarina Networks and I wrote about the technology in my previous blog. EMC, during the Celerra days of DART has compression but I don’t hear much about it in their VNX. Compression is there, believe me, embedded all the loads of EMC marketing.

Thin Provisioning is now a must-have and standard feature of all storage vendors. What is Thin Provisioning? The diagram below shows you:

In the past, storage systems aren’t so intelligent. You ask for 10TB, you are given 10TB and that 10TB is “deducted” from the storage capacity. That leads to wastage and storage inefficiencies. Today, Thin Provisioning will give you 10TB but storage capacity is consumed as it is being used. The capacity is not pre-allocated as in the past. Thin provisioning is a great diet pill for bloated storage projects. 

Another up and coming feature is storage tiering. Storage tiering, when associated to storage optimization, should include hierarchical storage management (HSM) and tape-out as well. Storage optimization solutions should not offer only in the storage array itself. Storage tiering within the storage array is available with most vendors – IBM EasyTier, EMC FAST2, Dell Fluid Data Management and many others. But what about data being moved out of the storage array? What about reducing the capacity of the data online or near-line? Why not put them offline if there isn’t a need for it?

I term this as Active Archiving, something I learned while I was at EMC. Here’s a look at EMC’s style of Active Archiving:

Active Archiving promotes the concept of data archiving and is not unique only to EMC. Almost all storage vendors, either natively or with 3rd party vendors, can perform fairly efficient data archiving in one way or another. One of the software that I liked (and not unique!) is Quantum Stornext. Here’s a video of how Quantum Stornext helps reduce the fat of the storage.

With the single-copy sharing feature of Quantum Stornext to multiple disparate OSes, there are lesser duplicate files in storage as well.

Tapes have been getting a bad name in the past few years. It has been repositioned and repurposed as an archive medium rather than a backup medium. But tape is the greenest and most powerful storage diet pill around. And we should not be discount tapes because tapes are fighting back. Pretty soon you will be hearing about Linear Tape File System (LTFS). In a nutshell, Linear Tape File System (LTFS) allows you to use the tape almost as if it were a hard disk. You can drag and drop files from your server to the tape, see the list of saved files using a standard operating system directory (no backup software catalog needed), and use point and click to restore. How cool is that!

And Solid State Drives (SSDs) makes sense as well.

There are times that we need IOPS and using spinning drives, we have to set up many disk spindles to achieve the IOPS that we want.  For example, using the diagram below from the godfather of storage, Greg Schulz,

The set of 16 spinning HDD drives on the left can only deliver 3,520 IOPS. The problem is, we have wasted a lot of disk space, as seen in the diagram below. This design, which most customer would be accustomed to, may look cheaper but in actual fact, is NOT.

If the price of a Fibre Channel HDD is RM2,000, the total of 16 would make up RM32,000.00. That is not inclusive of additional power and cooling and rack space and also the data management costs. Assuming the SSDs costs 5 times more than the Fibre Channel HDD. SSDs are capable of delivering very high IOPS. Here I am putting a modest 5,000 IOPS per SSDs. With just 2 SSDs (as the right design suggests), the total costs is only RM20,000. It has greater performance room to grow, and also savings in data management, power and cooling.

Folks, consider SSDs as part of your storage diet plan.

All these features are available, in whole or in part, and they are part of the storage technology offerings that is out there. With all these being said, are you doing something about it? Get off your lazy bum and start managing your storage and put your storage on a diet!!!

Hated GUI killing Ubuntu

OK, this is off-topic. Not my usual storage news but I thought I share this with you.

I am a Linux enthusiast. I play around with Linux – mostly Fedora and RedHat flavoured distros. For the past 2 years, one of the things I hated was the rise of Ubuntu. I don’t know why, but I just didn’t like the distro. Ubuntu, based on Debian, was the darling of the Linux desktop world. Perhaps I am a server guy but I just didn’t like Ubuntu. A few years ago, I won a Dell Latitude 2100 with Ubuntu pre-installed. I played around it with for a few days (hated it) and I decided to switch to Fedora 13 after that.

So, as Ubuntu’s star waned, I was piqued by the news. According to DistroWatch, which tracks popular Linux distros based on hits-per-day, Ubuntu is steadily on the decline. Here’s a look at the latest DistroWatch numbers of the top-10 Linux distros:

The decline is likely caused by Ubuntu Unity GUI, which replaced the likable GNOME/KDE interface in Ubuntu Natty Narwhal 11.04 version. The current version, Oneiric Ocelot 11.10, is taking a lot of hits of the wrong kind. It has dropped from the top spot and now down to #4.

Here’s a few screenshots of the Unity interface in Natty Narwhal.

 

I am pleasantly surprised that a GUI interface could cause so much harm to a Linux distro but judging by the number of haters out there, I guess the Unity GUI is killing Ubuntu’s popularity. Let’s see how Ubuntu will react in its version 12.04, Precise Pangolin.

The top distro is now Linux Mint, another Debian derivative. I have not tried Linux but I have been playing around with OpenSuSE 12.1. Not bad, buggy, but not bad.

I am still waiting to start my Fedora 16 download – 3.2GB baby over the Jaring SOMAport link. One day, but not today!

One-stop shop matters

Would you buy fruits from dedicated fruit seller or would you go to a hypermarket to get your fruits? It depends on your preference but it is more likely that you would go to a hypermarket to do your shopping. You might need some accompanying stuff while you are at the hypermarket. There will be ideas stirring in your mind that you might need this or that while planning your fruit shopping.

The “ideas stirring in your mind” is what concepts like hypermarkets do. They mess around with your thinking and they play with your psychological side because we are human beings. We are driven by desire and convenience.

In storage, this whole psychological game comes into play as well in the customer’s purchasing habits. If the customer is purchasing storage from one vendor, he/she might as well get the rest of the data management solutions from the same vendor. The vendors would pitch easy, cost-effective, seamless, proven and other well-received words to woo the customer. And the key ingredient is INTEGRATION.

All solutions these days are complex, and integration of getting all components to work together is not easy. I have been working on a private cloud data appliance for almost 2 months now, and it’s not as seamless or as easy as it seems. According to the whitepaper, everything was rosy and dandy but when it comes down to ground zero, even the vendors themselves had a hard time doing the integration. And this drives up costs, resources and time.

That is why EMC has become a behemoth in the storage industry, being an A-Z one-stop shop of everything of data storage and management to every one. That is why IBM and HP are able to leverage their server business and their other solutions and services portfolio to entice the customers to buy their products. That is why Oracle wants to worn the whole bloody application stack in their Exadata, to sell more Oracle database licenses. Pure-play storage vendors like NetApp and HDS, who prefer to work on partnership could be feeling the heat of late.

In the latest IDC quarter worldwide disk storage system tracker (that’s a mouthful), NetApp is the prominent one being mentioned as “losing ground“. Here’s a look at a table, which compares past quarters results.

It is difficult to quantify integration costs, because there are many intangible, and unseen costs and impacts. To pacify customer’s fears, and increase their confidence in the total data storage and management solutions, marketing initiatives such as whitepapers, reference architectures, webcasts, social media, social business networking, demos, proof-of-concepts (POCs) and many more are tools of the trade that could tip a customer towards a vendor’s solution.

I believe NetApp could begin to realize that. And rumours are swirling in the industry for NetApp to acquire strong solutions such as Commvault and Quantum. It makes sense. NetApp is in need of a strong data protection solution in which it has a say in the vision and direction of the software. NetApp needs a strong data deduplication solution in which Quantum has in its DXi series. Symantec could be a acquisition target as well as the security and data management giant’s stock has stagnated in the stock market.

NetApp itself could be an acquisition target as well, with IBM, Cisco and HP the possible suitors. NetApp’s solutions are a great solution set for IBM, who really needs to do something about their staggered storage portfolio. HP might have chewed a mouthful with 3PAR but HP has been bad news for the last 2 quarters, no thanks to its on-and-off fiasco of ditching it PC business and other crazy stunts of HP-versus-Oracle and their ex-CEO, Leo Apotheker. Cisco could bet on NetApp too. Both companies have strong relationship together, but Cisco is drying up. They are becoming a laggard in the networking industry and companies like Juniper are hitting back … hard!

All these jousting and shuffling are creating the consolidation of the storage industry. The top six players – EMC, NetApp, IBM, HP, HDS and Dell – owns more than 80% of the total storage market share in terms of revenue. As the data storage and management world becomes more complex, and the ubiquity of cloud computing demands absolute uptime with no room for errors, the one-stop shop makes sense. One throat to choke … as they say.

Magic on storage players

It’s that time of the year again where Gartner releases it Magic Quadrant for the block-access, external controller-based, mid-range and high-end modular disk arrays market. This particular is very important because it represents the mainstay of the overall storage industry, viewed from a more qualitative angle. Whereas the other charts and reports work with statistics and numbers, this is the chart that everyone in the industry flock to. Gartner Magic Quadrant (MQ) is the storage industry indicator of who’s are the leaders; who are the visionaries; who are the executive wizards and who are the laggards (also known as niche players).

So, this time around, who’s in the Leaders Quadrant?

The perennial players in the Leader’s Quadrant are EMC, IBM, NetApp, HP, Dell, and HDS. In my previous blog, I shared with you the IDC figures about market shares but the Gartner MQ shows are more subtle side, and one that perhaps carry more weight to organizations.

From the IDC numbers announced previously, we have seen Dell taking a beating. They have lost market share and similarly in this latest Gartner MQ, they have lost their significance of their influence as well. Everyone expected their Compellent solution to be robust and having EqualLogic, Ocarina and Exanet in its stable would strengthen their presence in the storage industry. Surprisingly, Dell lost on both IDC statistically charged market numbers and this Gartner MQ as well. Perhaps they were too hasty to dump EMC a few months ago?

Gartner also reported that HP has made significant leap in the Leader’s Quadrant. It has leapfrogged over HDS and IBM when comparing their position in Gartner’s MQ chart. This could be coming from their concerted effort to pitch their Converged Infrastructure, a vision that in my opinion, simplified computing. HP Malaysia shared with me their vision a few months ago, and I was impressed. What I was not very impressed then and even now, is that their storage solutions story is still staggered, lacking the gel. Perhaps it is work in progress for HP, the 3PAR, the IBRIX and the EVA. But one things for sure. They are slowly but surely getting the StoreOnce story right and that’s good news for customers. I did a review of HP StoreOnce technology a few months ago.

Perhaps it’s time for HP to ditch their VLS deduplication, which to me, confuses customers. By the way, HP VLS is an OEM from Sepaton. (Sepaton is “No tapes” spelled backwards)

Here’s a glimpse of last year’s Magic Quadrant.

 

In the Niche Quadrant, there are a few players making waves as well. 2 companies to watch out for are Huawei (they dropped Symantec 2 weeks ago) and Nexsan. Nexsan has been beefing up its marketing of late, and I often see them in mailing lists and ads on some websites I went to.

But the one to watch will be Huawei. This is a company with deep pockets, hiring the best in the storage industry and also has a very strong domestic market in China. In the next 2-3 years, Huawei could emerge as a strong contender to the big boys. So watch out!

Gartner Magic Quadrant is indeed weaving its magic and this time around the magic is good to HP.