I am guilty. I have not been tendering this blog for quite a while now, but it feels good to be back. What have I been doing? Since leaving NetApp 2 months or so ago, I have been active in the scenes again. This time I am more aligned towards data analytics and its burgeoning impact on the storage networking segment.
I was intrigued by an article posted by a friend of mine in Facebook. The article (circa 2013) was titled “Never, ever do this to Hadoop”. It described the author’s gripe with the SAN bigots. I have encountered storage professionals who throw in the SAN solution every time, because that was all they know. NAS, to them, was like that old relative smelled of camphor oil and they avoid NAS like a plague. Similar DAS was frowned upon but how things have changed. The pendulum has swung back to DAS and new market segments such as VSANs and Hyper Converged platforms have been dominating the scene in the past 2 years. I highlighted this in my blog, “Praying to the Hypervisor God” almost 2 years ago.
I agree with the author, Andrew C. Oliver. The “locality” of resources is central to Hadoop’s performance.
Consider these 2 models:
In the model on your left (Moving Data to Compute), the delivery process from Storage to Compute is HEAVY. That is because data has dependencies; data has gravity. However, if you consider the model on your right (Moving Compute to Data), delivering data processing to the storage layer is much lighter. Compute or data processing is transient, and the data in the compute layer is volatile. Once compute’s power is turned off, everything starts again from a clean slate, hence the volatile stage.
This is the philosophy of Hadoop, because moving data processing closer to where the bulk of the data is, where the storage reside, enhances the strengths and performance of Hadoop.
However, using server-based storage (aka DAS) is not efficient all the time, especially when it is adopted in the Enterprise. Hadoop clusters was designed with COTS (common off the shelf) x86 servers in mind. Along with these servers are the internal HDDs (hard disk drives) which are likely banded and RAIDed together with common LSI HBAs such as SAS 9211-8i/4i or SAS 9207-8i/4i OR RAID cards like LSI MegaRAID 9240 or 9260. Multiple of these type servers or nodes make up the Hadoop Cluster.
In order to improve the reliability of the data which resides in each of these server nodes (aka DataNodes in Hadoop speak), the data is broken down into a determined block size in the HDFS (Hadoop Distributed File System) ranging from 64-512MB. Then by nature, HDFS will make 2 more copies of that block and algorithmically put them into other DataNodes in the Hadoop Cluster. This nature of making 3 copies is defined in the /etc/hadoop/conf/hdfs-site.xml file, under the property of
This can put a considerable strain on the network as well as the servers nodes because they have to contend with disks I/Os and CPU resources which are much needed for the parallel processing of MapReduce.
Since we are talking about being cheap using COTS servers, RAID rebuilds after a disk failure can impact the performance of DataNodes as well. The LSI HBAs can do the offloads of the rebuilds, but the impact will be atomic to each node. This causes further degradation, grinding the Hadoop Cluster even more.
One thing I have stressed before is I/O drags performance. If you don’t do it well, you better be prepared to handle the raucous dissidents. Therefore, having a more modern approach to storage in a Hadoop Cluster to offset the impact of I/O is fundamental to scale a Hadoop Cluster in an Enterprise environment.
I am especially gravitated towards a direct-attached approached using SAS external HBAs to create a more scalable and more I/O deterministic approach to HDFS storage. NetApp has a well-documented validated architecture with their E-Series storage. It is a shared-nothing architecture, and the LUNs/volumes presented to the Hadoop DataNode is exactly the same as the internal “disks” architecture in most Hadoop Cluster COTS designs.
It also takes a “building block” approach. This resonates well with Enterprise infrastructure architects, because it is simple and offers sanity and calmness. We storage people like to work with “blocks” all the time. 😉
Here’s a view of NetApp’s approach to the Hadoop HDFS building block:
The DataNode is connected via a 6Gbps or 12 Gbps SAS connection, taking away the I/O duties from the direct attached storage of each DataNode. This means that an impact of RAID rebuilding is taken away from the DataNode, and allowing the Task Tracker (Hadoop 1.0) or the Application Master/Node Manager (Hadoop 2.0) to fully concentrate on its MapReduce jobs.
The E-Series is prepared with 8 independent RAID 5 groups with a 7+1 configuration, giving the Hadoop Cluster a much more resilient RAID protection than the normal DAS configuration. Here’s a look at the RAID groups setup in the E-Series configuration.
The design also advocates 4 DataNodes per single E-Series Storage Array. This can represent savings because consolidating with a 4:1 fan out ratio is better compared to the traditional DAS approach.
There are many other benefits in the above model as well. Data footprint reduction features such as deduplication or compression might help too (although I have not done much research here to affirm this).
As Enterprises are rapidly accepting Hadoop into its ecosystem, operational efficiency plays a big role. We cannot just duly accept that Hadoop Clusters should only be confined to the DAS design that it was made to be. It has to adapt, and this blog attempts associates some of the best storage best practices and nuances to assimilate the Hadoop Clusters in the Enterprise.
NetApp is going big with this 3rd Platform Enterprise Applications, especially in Data Analytics. The E-series, together with the Flash-powered EF-Series, are now being refreshed with a new SANtricity OS to address this fast growing market segment.
It is a great time to see both the Data Processing (Compute) and the Data Management (Storage) layers playing so well together.
In the paragraph that begins “Since we are talking about being cheap using COTS servers” you demonstrate that you do not understand how Hadoop recovers data. There is no RAID (and you don’t need RAID capable HBAs). Disks are never rebuilt, it is only data blocks that are re-copied. Since the data blocks are distributed throughout the cluster, the time to recover the data blocks of a lost disk goes down as the size of a cluster goes up. And thus the vulnerability window of another loss also shrinks.
Using RAID, there is a huge vulnerability window while a drive is being re-built (which can take many hours). And if the drive wasn’t full, you typically pay the entire rebuild cost anyways.
However, I agree that using external DAS makes a lot of sense – see the DriveScale product. But you don’t need RAID or smart JBODs, only dumb ones.