The reverse wars – DAS vs NAS vs SAN

It has been quite an interesting 2 decades.

In the beginning (starting in the early to mid-90s), SAN (Storage Area Network) was the dominant architecture. DAS (Direct Attached Storage) was on the wane as the channel-like throughput of Fibre Channel protocol coupled by the million-device addressing of FC obliterated parallel SCSI, which was only able to handle 16 devices and throughput up to 80 (later on 160 and 320) MB/sec.

NAS, defined by CIFS/SMB and NFS protocols – was happily chugging along the 100 Mbit/sec network, and occasionally getting sucked into the arguments about why SAN was better than NAS. I was already heavily dipped into NFS, because I was pretty much a SunOS/Solaris bigot back then.

When I joined NetApp in Malaysia in 2000, that NAS-SAN wars were going on, waiting for me. NetApp (or Network Appliance as it was known then) was trying to grow beyond its dot-com roots, into the enterprise space and guys like EMC and HDS were frequently trying to put NetApp down.

It’s a toy”  was the most common jibe I got in regular engagements until EMC suddenly decided to attack Network Appliance directly with their EMC CLARiiON IP4700. EMC guys would fondly remember this as the “NetApp killer“. Continue reading

Has Object Storage become the everything store?

I picked up a copy of latest Brad Stone’s book, “The Everything Store: Jeff Bezos and the Age of Amazon at the airport on my way to Beijing last Saturday. I have been reading it my whole time I have been in Beijing, reading in awe about the turbulent ups and downs of Amazon.com.

The Everything Store cover

In its own serendipitous ways, Object-based Storage Devices (OSDs) have been floating in my universe in the past few weeks. Seems like OSDs have been getting a lot of coverage lately and suddenly, while in the shower, I just had an epiphany!

Are storage vendors now positioning Object-based Storage Devices (OSDs) as Everything Store?

Continue reading

4TB disks – the end of RAID

Seriously? 4 freaking terabyte disk drives?

The enterprise SATA/SAS disks have just grown larger, up to 4TB now. Just a few days ago, Hitachi boasted the shipment of the first 4TB HDD, the 7,200 RPM Ultrastar™ 7K4000 Enterprise-Class Hard Drive.

And just weeks ago, Seagate touted their Heat-Assisted Magnetic Recording (HAMR) technology will bring forth the 6TB hard disk drives in the near future, and 60TB HDDs not far in the horizon. 60TB is a lot of capacity but a big, big nightmare for disks availability and data backup. My NetApp Malaysia friend joked that the RAID reconstruction of 60TB HDDs would probably finish by the time his daughter finishes college, and his daughter is still in primary school!.

But the joke reflects something very serious we are facing as the capacity of the HDDs is forever growing into something that could be unmanageable if the traditional implementation of RAID does not change to meet such monstrous capacity.

Yes, RAID has changed since 1988 as every vendor approaches RAID differently. NetApp was always about RAID-4 and later RAID-DP and I remembered the days when EMC had a RAID-S. There was even a vendor in the past who marketed RAID-7 but it was proprietary and wasn’t an industry standard. But fundamentally, RAID did not change in a revolutionary way and continued to withstand the ever ballooning capacities (and pressures) of the HDDs. RAID-6 was introduced when the first 1TB HDDs first came out, to address the risk of a possible second disk failure in a parity-based RAID like RAID-4 or RAID-5. But today, the 4TB HDDs could be the last straw that will break the camel’s back, or in this case, RAID’s back.

RAID-5 obviously is dead. Even RAID-6 might be considered insufficient now. Having a 3rd parity drive (3P) is an option and the only commercial technology that I know of which has 3 parity drives support is ZFS. But having 3P will cause additional overhead in performance and usable capacity. Will the fickle customer ever accept such inadequate factors?

Note that 3P is not RAID-7. RAID-7 is a trademark of a old company called Storage Computer Corporation and RAID-7 is not a standard definition of RAID.

One of the biggest concerns is rebuild times. If a 4TB HDD fails, the average rebuild speed could take days. The failure of a second HDD could up the rebuild times to a week or so … and there is vulnerability when the disks are being rebuilt.

There are a lot of talks about declustered RAID, and I think it is about time we learn about this RAID technology. At the same time, we should demand this technology before we even consider buying storage arrays with 4TB hard disk drives!

I have said this before. I am still trying to wrap my head around declustered RAID. So I invite the gurus on this matter to comment on this concept, but I am giving my understanding on the subject of declustered RAID.

Panasas‘ founder, Dr. Garth Gibson is one of the people who proposed RAID declustering way back in 1999. He is a true visionary.

One of the issues of traditional RAID today is that we still treat the hard disk component in a RAID domain as a whole device. Traditional RAID is designed to protect whole disks with block-level redundancy.  An array of disks is treated as a RAID group, or protection domain, that can tolerate one or more failures and still recover a failed disk by the redundancy encoded on other drives. The RAID recovery requires reading all the surviving blocks on the other disks in the RAID group to recompute blocks lost on the failed disk. In short, the recovery, in the event of a disk failure, is on the whole object and therefore, a entire 4TB HDD has to be recovered. This is not good.

The concept of RAID declustering is to break away from the whole device idea. Apply RAID at a more granular scale. IBM GPFS works with logical tracks and RAID is applied at the logical track level. Here’s an overview of how is compares to the traditional RAID:

The logical tracks are spread out algorithmically spread out across all physical HDDs and the RAID protection layer is applied at the track level, not at the HDD device level. So, when a disk actually fails, the RAID rebuild is applied at the track level. This significant improves the rebuild times of the failed device, and does not affect the performance of the entire RAID volume much. The diagram below shows the declustered RAID’s time and performance impact when compared to a traditional RAID:

While the IBM GPFS approach to declustered RAID is applied at a semi-device level, the future is leaning towards OSD. OSD or object storage device is the next generation of storage and I blogged about it some time back. Panasas is the leader when it comes to OSD and their radical approach to this is applying RAID at the object level. They call this Object RAID.

With object RAID, data protection occurs at the file-level. The Panasas system integrates the file system and data protection to provide novel, robust data protection for the file system.  Each file is divided into chunks that are stored in different objects on different storage devices (OSD).  File data is written into those container objects using a RAID algorithm to produce redundant data specific to that file.  If any object is damaged for whatever reason, the system can recompute the lost object(s) using redundant information in other objects that store the rest of the file.

The above was a quote from the blog of Brent Welch, Panasas’ Director of Software Architecture. As mentioned, the RAID protection of the objects in the OSD architecture in Panasas occurs at file-level, and the file or files constitute the object. Therefore, the recovery domain in Object RAID is at the file level, confining the risk and damage of data loss within the file level and not at the entire device level. Consequently, the speed of recovery is much, much faster, even for 4TB HDDs.

Reliability is the key objective here. Without reliability, there is no availability. Without availability, there is no performance factors to consider. Therefore, the system’s reliability is paramount when it comes to having the data protected. RAID has been the guardian all these years. It’s time to have a revolutionary approach to safeguard the reliability and ensure data availability.

So, how many vendors can claim they have declustered RAID?

Panasas is a big YES, and they apply their intelligence in large HPC (high performance computing) environments. Their technology is tried and tested. IBM GPFS is another. But where are the rest?

 

3TB Seagate – a performance sloth

I can’t get home. I am stuck here at the coffee shop waiting out the traffic jam after the heavily downpour an hour ago.

It has been an interesting week for me, which began last week when we were testing the new Seagate 3TB Constellation ES.2 hard disk drives. It doesn’t matter if it was SAS or SATA because the disks were 7,200 RPM, and basically built the same. SAS or SATA is merely the conduit to the disks and we were out there maneuvering the issue at hand.

Here’s an account of  testing done by my team. My team has been testing the drives meticulously, using every trick in the book to milk performance from the Seagate drives. In the end, it wasn’t the performance we got but more like duds from Seagate where these type of drives are concerned.

How did the tests go?

We were using a Unix operating system to test the sequential writes on different partitions of the disks, each with a sizable GB per partition. In one test, we used 100GB per partition. With each partition, we were testing the outer cylinders to the inner cylinders, and as the storage gurus will tell you, the outer rings runs at a faster speed than the inner rings.

We thought it could be the file system we were using, so we switched the sequential writes to raw disks. We tweaked the OS parameters and tried various combinations of block sizes and so on. And what discovered was a big surprise.

The throughput we got from the sequential writes were horrible, started out with MB/sec lower almost 25% lower than a 2TB Western Digital RE4 disk, and as it went on, the throughput in the inner rings went down to single digit MB/sec. According to reliable sources, the slowest published figures by Seagate were in the high 60’s for MB/sec but what we got were close to 20+MB/sec. The Western Digital RE4 was giving out consistent throughput numbers throughout the test. We couldn’t believe it!

We scoured the forums looking for similar issues, but we did not find much about this.This could be a firmware bug. We are in the midst of opening an escalation channel to Seagate to seek explanation. I would like to share what we have discovered and the issue can be easily reproduced. For customers who have purchased storage arrays with 2 or 3TB Seagate Constellation ES/2 drives, please take note. We were disappointed with the disks but thanks to my team for their diligent approach that resulted in this discovery.