Considerations of Hadoop in the Enterprise

I am guilty. I have not been tendering this blog for quite a while now, but it feels good to be back. What have I been doing? Since leaving NetApp 2 months or so ago, I have been active in the scenes again. This time I am more aligned towards data analytics and its burgeoning impact on the storage networking segment.

I was intrigued by an article posted by a friend of mine in Facebook. The article (circa 2013) was titled “Never, ever do this to Hadoop”. It described the author’s gripe with the SAN bigots. I have encountered storage professionals who throw in the SAN solution every time, because that was all they know. NAS, to them, was like that old relative smelled of camphor oil and they avoid NAS like a plague. Similar DAS was frowned upon but how things have changed. The pendulum has swung back to DAS and new market segments such as VSANs and Hyper Converged platforms have been dominating the scene in the past 2 years. I highlighted this in my blog, “Praying to the Hypervisor God” almost 2 years ago.

I agree with the author, Andrew C. Oliver. The “locality” of resources is central to Hadoop’s performance.

Consider these 2 models:

moving-compute-storage

In the model on your left (Moving Data to Compute), the delivery process from Storage to Compute is HEAVY. That is because data has dependencies; data has gravity. However, if you consider the model on your right (Moving Compute to Data), delivering data processing to the storage layer is much lighter. Compute or data processing is transient, and the data in the compute layer is volatile. Once compute’s power is turned off, everything starts again from a clean slate, hence the volatile stage.

Continue reading

The dark ages of data is coming

A recent report intrigued me. Given the recent uprising of data, data and more data, things are getting a bit absurd about the voluminous data we are collecting and storing. The flip is that we might need all these data for analytics and getting more insight from the data.

The Veritas Darkberg report revealed that a very large percentage of the data collected and stored by organizations are useless data, unknown and unused. I captured a snapshot of the report below:

Screen Shot 2015-11-08 at 8.03.05 AM

From the screenshot above, it shows 54% of the landscape surveyed is dark data, unseen and clogging up the storage. And in an instance, the Darkberg (cross of “Dark” and “Iceberg”) report knocked a lot of sense into this whole data acquisition frenzy we are going through right now.

Continue reading

The transcendence of Data Fabric

The Register wrote a damning piece about NetApp a few days ago. I felt it was irresponsible because this is akin to kicking a man when he’s down. It is easy to do that. The writer is clearly missing the forest for the trees. He was targeting NetApp’s Clustered Data ONTAP (cDOT) and missing the entire philosophy of NetApp’s mission and vision in Data Fabric.

I have always been a strong believer that you must treat Data like water. Just like what Jeff Goldblum famously quoted in Jurassic Park, “Life finds a way“, data as it moves through its lifecycle, will find its way into the cloud and back.

And every storage vendor today has a cloud story to tell. It is exciting to listen to everyone sharing their cloud story. Cloud makes sense when it addresses different workloads such as the sharing of folders across multiple devices, backup and archiving data to the cloud, tiering to the cloud, and the different cloud service models of IaaS, PaaS, SaaS and XaaS.

Continue reading

How valuable is your data anywhere?

I was a speaker at the Data Management and Document Control conference 2 weeks’s ago. It was a conference aimed at the Oil & Gas industry, and my presentation was primarily focused on Data in Exploration & Production (E&P) segment of the industry. That’s also the segment that brings in the mega big bucks!

The conversations with the participants have validated and strengthened the fact that no matter how we talk about how valuable data is to the organization, how data is the asset of the organization, the truth is most organization SUCKS big time when it comes to data management. The common issues faced in the E&P data management in Oil & Gas are probably quite similar to many other industries. For the more regulated industries such as banking, financial institutions, governments and telecommunications, data management, I would assume, is a tad better.

The fact of the matter is there little technology change in the past decade in data storage, data protection and data movement. There are innovations from a technology point of view but most technology innovations do not address the way data could be better managed, especially from a data consolidation point of view.

Continue reading

Swiss army of data management

Back in 2000, before I joined NetApp, I bought one of my first storage technology books. It was “The Holy Grail of Data Storage Management” by Jon William Toigo. The book served me very well, because it opened up my eyes about the storage networking and data management world.

I mean, I have been doing storage for 7 years before the year 2000, but I was an implementation and support engineer. I installed my first storage arrays in 1993, the trusty but sometimes odd, SPARCstorage Array 1000. These “antiques” were running 0.25Gbps Fibre Channel, and that nationwide bank project gave me my first taste and insights of SAN. Point-to-point, but nonetheless SAN.

Then at Sun from 1997-2000, I was implementing the old Storage Disk Packs with FastWide SCSI, moving on to the A5000 Photons (remember these guys?) and was trained on the A7000, Sun’s acquisition of Encore way back in the late nineties. Then there was “Purple”, the T300s which I believe came from the acquisition of MaxStrat.

The implementation and support experience was good but my world opened up when I joined NetApp in mid-2000. And from the Jon Toigo’s book, I learned one of the most important lessons that I have carried with me till this day – “Data Storage Management is 3x more expensive that the data storage equipment itself“. Given the complexity of the data today compared to the early 2000s, I would say that it is likely to be 4-5x more expensive.

And yet, I am still perplexed that many customers and prospects still cannot see the importance and the gravity of data storage management, and more precisely, data management itself.

A couple of months ago, I had to opportunity to work on an RFP for project in Singapore. The customer had thousands of tapes storing digital media files in addition to tens of TBs running on IBM N-series storage (translated to a NetApp FAS3xxx). They wanted to revamp their architecture, and invited several vendors in Singapore to propose. I was working for a friend, who is an EMC reseller. But when I saw that tapes figured heavily in their environment, and the other resellers were proposing EMC Isilon and NetApp C-Mode, I thought that these resellers were just trying to stuff a square peg into a round hole. They had not addressed the customer’s issues and problems at all, and was just merely proposing storage for the sake of storage capacity. Sure, EMC Isilon is great for the media and entertainment business, but EMC Isilon is not the data management solution for this customer’s situation. Neither was NetApp with the C-Mode solution.

What the customer needed to solve was a data management solution, one that involved

  • Single namespace for video editors and programmers, regardless of online disk storage or archived tape storage
  • Transparent and automated storage tiering and addressing the value of the data to the storage media
  • A backup tier which kept a minimum 2 recent copies for file restoration in case of disasters
  • An archived tier which they could share with their counterparts in other regions
  • A transparent replication tier which would allow them to implement a simplified disaster recovery mechanism with their counterparts in Japan and China

And these were the key issues that needed to be addressed, not the scale-out, usual snapshot mechanism. These features are good for a primary, production storage infrastructure, but this customer’s business operations had about 70-80% data and files which were offline in tapes. I took the liberty to advise my friend to look into Quantum StorNext, because the solution could solve the business problem NOT solving it from an IT point of view. Continue reading

Going Ga Ga over Storage Networking

Before you start thinking that I am ripping off Lady Gaga, this blog’s name of “Storage Gaga” is NOT from Lady Gaga. It’s from Queen’s Radio Ga Ga song which I happen to be listening in my car.

Why Ga Ga? Ga Ga in the Free Dictionary (link: http://www.thefreedictionary.com/gaga) means crazy over something (at least one of the meanings anyway). That’s what I am. Since leaving my last job – which was on Tuesday (July 19th 2011) this week – I want to do more for storage networking and data management. I want to share things I find out, information that I have learned and so on.

So watch this space for more info … more on the way.

p/s. This rainy morning, I am going to arrange and organize all my computer books. It’s going to be fun!