Boosting Solid States beyond SATA

Lately, I have been getting deeper and deeper into low-level implementation related to storage technologies. In my previous blog, I was writing my learning adventure with Priority Flow Control (PFC) and intend to further the Data Center Bridging concepts with future blog entries.

Before I left for Sydney for a holiday last week, I got sidetracked into exciting stuff that’s happening in my daily encounters with friends and new friends. 2 significant storage related technologies fell onto my lap. One is NVMe (Non-Volatile Memory express) and the other FPGA (Field Programmable Gate Array).

While this blog is going to be about NVMe, I actually found FPGA much more exciting to me. Through conversations, I found that there are 2 “biggies” in the FPGA world, and they are designed and manufactured by Xilink and Altera. I admit that I have not done my homework on FPGA yet, having just returned from Sydney last night. I will blog about FPGA in future blogs.

But NVMe is also an important technology direction to the storage world as well.

I think most of us are probably already mesmerized by solid state drives. The bombardment of marketing, presentations, advertising and whatever else the vendors do to promote (and self-promote) solid state drives are inundating the intellectual senses of consumers and enterprises alike. And yet, many vendors do not explain both the pros and cons of integrating solid states into their IT environment. Even worse, many don’t even know the strengths and weaknesses of solid states, hence creating some exaggeration that continues to create a spiral vortex of inaccuracies. Like a self-feeding frenzy, the industry seems to have placed solid state storage as the saviour of the enterprise storage world. Go figure with that!

Continue reading

The big boys better be flash friendly

An interesting article came up in the news this week. The article, from the ever popular The Register, mentioned 3 up and rising storage stars, Nimble Storage, Tintri and Tegile, and their assault on a flash strategy “blind spot” of the big boys, notably EMC and NetApp.

I have known about Nimble Storage and Tintri for a couple of years now, and I did take some time to read up on their storage technology offering. Tegile is new to me when it appeared on my radar after announced as the Gold Winner of the enterprise storage category for 2012.

The Register article intriqued me because it implied that these traditional storage vendors such as EMC and NetApp are probably doing a “band-aid” when putting together their flash storage strategy. And typically, I see these strategic concepts introduced by these 2 vendors:

  1. Have a server-side cache strategy by putting a PCIe card on the hosting server
  2. Have a network-based all-flash caching area
  3. Have a PCIe-based flash card on the storage system
  4. Have solid state drives (SSDs) in its disk shelves enclosures

In (1), EMC has VFCache (the server side caching software has been renamed to XtremSW Cache and under repackaging with the Xtrem brand name) and NetApp has it FlashAccel solution. Previously, as I was informed, FlashAccel was using the FusionIO ioTurbine solution but just days ago, NetApp expanded the LSI Nytro WarpDrive into its FlashAccel solution as well. The main objective of a server-side caching strategy using flash is to accelerate mostly read-based I/O operations for specific application workloads at the server side.

Continue reading

“I want to put in my own hard disk”

I want to put in my own hard disk“.

If a customer ever utter that sentence, it will trigger a storage vendor meltdown. Panic buttons, alarm bells, and everything else that will lead a salesman to go berserk. That’s a big NO, NO!

For decades, storage vendors have relied on proprietary hardware to keep customers in line, and have customers continue to sign hefty maintenance contracts until the next tech refresh. The maintenance contract, with support, software upgrades and hardware spares replacement, defines the storage networking industry that we are in. Even as some vendors have commoditized their hardware on the x86 platforms, and on standard enterprise hard disk drives (HDDs), NICs and HBAs, that openness and convenience of commodity hardware savings are usually not passed on the customers.

It is easy to explain to customers that keeping their enterprise data in reliable and high performance storage hardware with performance optimization and special firmware is paramount, and any unwarranted and unvalidated hardware would put the customer’s data at high risk.

There is a choice now. The ripple of enterprise-grade, open storage kernel and file system has just started its first ring, and we hope that this small ripple will reverberate across the storage industry in the next few years.

Continue reading

4TB disks – the end of RAID

Seriously? 4 freaking terabyte disk drives?

The enterprise SATA/SAS disks have just grown larger, up to 4TB now. Just a few days ago, Hitachi boasted the shipment of the first 4TB HDD, the 7,200 RPM Ultrastar™ 7K4000 Enterprise-Class Hard Drive.

And just weeks ago, Seagate touted their Heat-Assisted Magnetic Recording (HAMR) technology will bring forth the 6TB hard disk drives in the near future, and 60TB HDDs not far in the horizon. 60TB is a lot of capacity but a big, big nightmare for disks availability and data backup. My NetApp Malaysia friend joked that the RAID reconstruction of 60TB HDDs would probably finish by the time his daughter finishes college, and his daughter is still in primary school!.

But the joke reflects something very serious we are facing as the capacity of the HDDs is forever growing into something that could be unmanageable if the traditional implementation of RAID does not change to meet such monstrous capacity.

Yes, RAID has changed since 1988 as every vendor approaches RAID differently. NetApp was always about RAID-4 and later RAID-DP and I remembered the days when EMC had a RAID-S. There was even a vendor in the past who marketed RAID-7 but it was proprietary and wasn’t an industry standard. But fundamentally, RAID did not change in a revolutionary way and continued to withstand the ever ballooning capacities (and pressures) of the HDDs. RAID-6 was introduced when the first 1TB HDDs first came out, to address the risk of a possible second disk failure in a parity-based RAID like RAID-4 or RAID-5. But today, the 4TB HDDs could be the last straw that will break the camel’s back, or in this case, RAID’s back.

RAID-5 obviously is dead. Even RAID-6 might be considered insufficient now. Having a 3rd parity drive (3P) is an option and the only commercial technology that I know of which has 3 parity drives support is ZFS. But having 3P will cause additional overhead in performance and usable capacity. Will the fickle customer ever accept such inadequate factors?

Note that 3P is not RAID-7. RAID-7 is a trademark of a old company called Storage Computer Corporation and RAID-7 is not a standard definition of RAID.

One of the biggest concerns is rebuild times. If a 4TB HDD fails, the average rebuild speed could take days. The failure of a second HDD could up the rebuild times to a week or so … and there is vulnerability when the disks are being rebuilt.

There are a lot of talks about declustered RAID, and I think it is about time we learn about this RAID technology. At the same time, we should demand this technology before we even consider buying storage arrays with 4TB hard disk drives!

I have said this before. I am still trying to wrap my head around declustered RAID. So I invite the gurus on this matter to comment on this concept, but I am giving my understanding on the subject of declustered RAID.

Panasas‘ founder, Dr. Garth Gibson is one of the people who proposed RAID declustering way back in 1999. He is a true visionary.

One of the issues of traditional RAID today is that we still treat the hard disk component in a RAID domain as a whole device. Traditional RAID is designed to protect whole disks with block-level redundancy.  An array of disks is treated as a RAID group, or protection domain, that can tolerate one or more failures and still recover a failed disk by the redundancy encoded on other drives. The RAID recovery requires reading all the surviving blocks on the other disks in the RAID group to recompute blocks lost on the failed disk. In short, the recovery, in the event of a disk failure, is on the whole object and therefore, a entire 4TB HDD has to be recovered. This is not good.

The concept of RAID declustering is to break away from the whole device idea. Apply RAID at a more granular scale. IBM GPFS works with logical tracks and RAID is applied at the logical track level. Here’s an overview of how is compares to the traditional RAID:

The logical tracks are spread out algorithmically spread out across all physical HDDs and the RAID protection layer is applied at the track level, not at the HDD device level. So, when a disk actually fails, the RAID rebuild is applied at the track level. This significant improves the rebuild times of the failed device, and does not affect the performance of the entire RAID volume much. The diagram below shows the declustered RAID’s time and performance impact when compared to a traditional RAID:

While the IBM GPFS approach to declustered RAID is applied at a semi-device level, the future is leaning towards OSD. OSD or object storage device is the next generation of storage and I blogged about it some time back. Panasas is the leader when it comes to OSD and their radical approach to this is applying RAID at the object level. They call this Object RAID.

With object RAID, data protection occurs at the file-level. The Panasas system integrates the file system and data protection to provide novel, robust data protection for the file system.  Each file is divided into chunks that are stored in different objects on different storage devices (OSD).  File data is written into those container objects using a RAID algorithm to produce redundant data specific to that file.  If any object is damaged for whatever reason, the system can recompute the lost object(s) using redundant information in other objects that store the rest of the file.

The above was a quote from the blog of Brent Welch, Panasas’ Director of Software Architecture. As mentioned, the RAID protection of the objects in the OSD architecture in Panasas occurs at file-level, and the file or files constitute the object. Therefore, the recovery domain in Object RAID is at the file level, confining the risk and damage of data loss within the file level and not at the entire device level. Consequently, the speed of recovery is much, much faster, even for 4TB HDDs.

Reliability is the key objective here. Without reliability, there is no availability. Without availability, there is no performance factors to consider. Therefore, the system’s reliability is paramount when it comes to having the data protected. RAID has been the guardian all these years. It’s time to have a revolutionary approach to safeguard the reliability and ensure data availability.

So, how many vendors can claim they have declustered RAID?

Panasas is a big YES, and they apply their intelligence in large HPC (high performance computing) environments. Their technology is tried and tested. IBM GPFS is another. But where are the rest?


We raid vRAID

I took a bit of time off to read through Violin’s vRAID technology because I realized that vRAID (other than Violin’s vXM architecture) is the other most important technology that differentiates Violin Memory from the other upstarts. I blogged at a high-level about Violin a few entries ago, and we are continuing Violin impressive entrance with a storage technology that have been around for almost 25 years – RAID. Incidentally, I found this picture of the original RAID paper (see below):

Has RAID evolved with solid state storage? Evidently, no, because I have not read of any vendors (so far) touting any RAID revolution in their solid state offerings. There has been a lot of negative talks about RAID, but RAID has been the cornerstone and the foundation of storage ever since the beginning. But with the onslaughts of very large capacity HDDs, the demands of packing more bits-per-inch and the insatiable needs for reliability, RAID is slowly beginning to show its age. Cracks in the armour, I would say. And there are many newer, slightly more refined versions of RAID, from the Network RAID-style of HP P4000 or the Dell EqualLogic, to the RAID-X of IBM XIV, to innovations of declustered RAID in Panasas. (Interestingly, one of the early founders of the actual RAID concept paper, Garth Gibson, is the founder of Panasas).

And the new vRAID from Violin-System doesn’t sway much from the good ol’ RAID, but it has been adapted to address the issues of Solid State Devices.

Solid State devices (notably NAND Flash since everyone is using them) are very different from the usual spinning disks of HDDs. They behave differently and pairing solid state devices with the present implementations of RAID could be like mixing oil and water. I am not saying that the present RAID cannot work with solid state devices, but has RAID adapted to the idiosyncrasies of Flash?

It is like putting an old crank shaft into a new car. It might work for a while, but in the long run, it could damage the car. Similarly, conventional RAID might have detrimental performance and availability impact with solid state devices. And we have hardly seen storage vendors coming up to say that their RAID technology has been adapted to the solid state devices that they are selling. This silence could likely mean that they are just adapting to market requirements and not changing their RAID codes very much to take advantage of Flash or other solid state storage for that matter. Violin Memory has boldly come forward to meet that requirement and vRAID is their answer.

Violin argues that there are bottlenecks at the external RAID controller or software RAID level as well as use of legacy disk drive interfaces. And this is indeed true, because this very common RAID implementation squeezes performance at the expense of the other components such as CPU cycles.

Furthermore, there are plenty of idiosyncrasies in Flash with things such as erase-first, then write mechanism. The nature of NAND Flash, unlike DRAM, requires a block to be erased first before a write to the block is allowed. It does not “modify” per se, where the operations of read-modify-write is often applied in parity-based RAIDs of 5 and 6. Because of this nature, it is more like read-erase-write, and when the erase of the block is occurring, the read operation is stalled. That is why most SSDs will have impressive read latency (in microseconds), but very poor writes (in milliseconds). Furthermore, the parity-based RAID’s write penalty, can further aggravate the situation when the typical RAID technology is applied to NAND Flash solid state storage.

As the blocks in the NAND Flash build up, the accumulation of read-erase-write will not only reduce the lifespan of the blocks in the NAND Flash, it will also reduce the IOPS to a state we called Normalized Steady State. I wrote about this in my blog, “Not all SSDs are the same” some moons ago. In my blog, SNIA Solid State Storage Performance Testing Suite (SSS-PTS), there were 3 distinct phases of a typical NAND Flash SSD:

  • Fresh of out the Box (FOB)
  • Transition
  • Steady State
This performance degradation is part of what vendors call “Write Cliff”, where there is a sudden drop in IOPS performance as the NAND Flash SSD ages. Here’s a graph that shows the performance drop.
Violin’s vRAID, implemented within its switched vXM architecture itself, and using proprietary high performance flash controllers and the flash-optimized vRAID technology, is able deliver sustained IOPS throughout the lifespan of the flash SSD, as shown below:
To understand vRAID we have to understand the building blocks of the Violin storage array. NAND Flash chips of 4GB are packed into a Flash Package of 8 giving it 32GB. And 16 of these 32GB Flash Package are then consolidated into a 512GB VIMM (Violin Inline Memory Module). The VIMM is the starting block and can be considered as a “disk”, since we are used to the concept of “disk” in the storage networking world. 5 of these VIMMs will create a RAID group of 4+1 (four data and one parity), giving the redundancy, performance and capacity similar to RAID-5.
The block size used is 4K block and this 4K block is striped across the RAID group with 1K pages each on each of the VIMMs in the RAID group. Each of this 1K page is managed independently and can be placed anywhere in any flash block in the VIMMs, and spread out for lowest possible latency and bandwidth. This contributes to the “spike free latency” of Violin Memory. Additionally, there is ECC protection within each 1K page to correct flash bit error.
To protect against metadata corruption, there is an additional, built-in RAID Check bit to correct the VIMM errors. Lastly, one important feature that addresses the read-erase-write weakness of NAND Flash, the vRAID ensures that the slow erases never block a Read or a Write. This architectural feature enable spike-free latency in mixed Read/Write environments.
Here’s a quick overview of Violin’s vRAID architecture:
I still feel that we need a radical move away from the traditional RAID and vRAID is moving in the right direction to evolve RAID to meet the demands of the data storage market. Revolutionary and radical it may not be, but then again, is the market ready for anything else?
As I said, so far Violin is the only all-Flash vendor that has boldly come forward to meet the storage latency problem head-on, and they have been winning customers very quickly. Well done!

Lightning about to strike

Watch out for February 6th, 2012 folks! The Lightning is about to strike!

Yes, it is likely that EMC will be announcing their server-based, 8-lane PCIe Flash memory card in early week of February. The PCIe card was dubbed “Project Lightning” when it was first announced in EMC World in May last year. It represents EMC’s first foray of products that sits on the server side, giving the impression that EMC could be entering the server business. I blogged about this way back in September last year. As explained by the EMC folks, they are not going into the server business but rather “extending” their performance tiering into the server space. Think of it like an umbilical cord that  sucks the server’s CPU processing power to give maximum performance boost for the EMC storage.

The card will sport Solid State Drive from LSI Warp Drive and comes in 100/200/300GB capacity. Here’s a picture of how the Lightning card would look like:

The SSD is an SLC (Single Level Cell) and is capable of delivering 150,000 random reads IOPS based on 4K blocks and 190,000 random writes IOPS. It can squeeze 1.4GB/sec in read throughput. While it is not on par with the performance of Fusion-IO, it can definitely do well leveraging EMC’s huge customer base. Furthermore, PCIe-based Flash memory cards such as Fusion-IO will not be able to take advantage of the bridge that links the server and the storage, making it confined to the server’s resources. The advantage is definitely EMC when you explore the possibilities.

Here’s a view of a slide from Virtual Geek summarizing the Project Lightning:

The Lightning card is aimed at customers who demand the highest performance, even higher that Tier 0. It will be integrated with EMC’s FAST (Fully Automated Storage Tiering) technology and is available to the VNX and VMAX platforms.

So watch out folks, because Lightning is about to strike soon!

Not all SSDs are the same

Happy Lunar New Year! The Chinese around world has just ushered in the Year of the Water Dragon yesterday. To all my friends and family, and readers of my blog, I wish you a prosperous and auspicious Chinese New Year!

Over the holidays, I have been keeping up with the progress of Solid State Drives (SSDs). I am sure many of us are mesmerized by SSDs and the storage vendors are touting the best of SSDs have to offer. But let me tell you one thing – you are probably getting the least of what the best SSDs have to offer. You might be puzzled why I say things like this.

Let me share with a common sales pitch. Most (if not all) storage vendors will tout performance (usually IOPS) as the greatest benefits of SSDs. The performance numbers have to be compared to something, and that something is your regular spinning Hard Disk Drives (HDDs). The slowest SSDs in terms of IOPS is about 10-15x faster than the HDDs. A single SSD can at least churn 5,000 IOPS when compared to the fastest 15,000 RPM HDDs, which churns out about 200 IOPS (depending on HDD vendors). Therefore, the slowest SSDs can be 20-25x faster than the fastest HDDs, when measured in IOPS.

But the intent of this blogger is to share with you more about SSDs. There’s more to know because SSDs are not built the same. There are write-bias SSDs, read-bias SSDs; there are SLC (single level cell) and MLC (multi level cell) SSDs and so on. How do you differentiate them if Vendor A touts their SSDs and Vendor B touts their SSDs as well? You are not comparing SSDs and HDDs anymore. How do you know what questions to ask when they show you their performance statistics?

SNIA has recently released a set of methodology called “Solid State Storage (SSS) Performance Testing Specifications (PTS)” that helps customers evaluate and compare the SSD performance from a vendor-neutral perspective. There is also a whitepaper related to SSS PTS. This is something very important because we have to continue to educate the community about what is right and what is wrong.

In a recent webcast, the presenters from the SNIA SSS TWG (Technical Working Group) mentioned a few questions that I  think we as vendors and customers should think about when working with an SSD sales pitch. I thought I share them with you.

  • Was the performance testing done at the SSD device level or at the file system level?
  • Was the SSD pre-conditioned before the testing? If so, how?
  • Was the performance results taken at a steady state?
  • How much data was written during the testing?
  • Where was the data written to?
  • What data pattern was tested?
  • What was the test platform used to test the SSDs?
  • What hardware or software package(s) used for the testing?
  • Was the HBA bandwidth, queue depth and other parameters sufficient to test the SSDs?
  • What type of NAND Flash was used?
  • What is the target workload?
  • What was the percentage weight of the mix of Reads and Writes?
  • Are there warranty life design issue?

I thought that these questions were very relevant in understanding SSDs’ performance. And I also got to know that SSDs behave differently throughout the life stages of the device. From a performance point of view, there are 3 distinct performance life stages

  • Fresh out of the box (FOB)
  • Transition
  • Steady State


As you can see from the graph below, a SSD, fresh out of the box (FOB) displayed considerable performance numbers. Over a period of time (the graph shown minutes), it transitioned into a mezzanine stage of lower IOPS and finally, it normalized to the state called the Steady State. The Steady State is the desirable test range that will give the most accurate type of IOPS numbers. Therefore, it is important that your storage vendor’s performance numbers should be taken during this life stage.

Another consideration when understanding the SSDs’ performance numbers are what type of tests used? The test could be done at the file system level or at the device level. As shown in the diagram below, the test numbers could be taken from many different elements through the stack of the data path.


Performance for cached data would given impressive numbers but it is not accurate. File system performance will not be useful because the data travels through different layers, masking the true performance capability of the SSDs. Therefore, SNIA’s performance is based on a synthetic device level test to achieve consistency and a more accurate IOPS numbers.

There are many other factors used to determine the most relevant performance numbers. The SNIA PTS test has 4 main test suite that addresses different aspects of the SSD’s performance. They are:

  • Write Saturation test
  • Latency test
  • IOPS test
  • Throughput test

The SSS PTS would be able to reveal which is a better SSD. Here’s a sample report on latency.

Once again, it is important to know and not to take vendors’ numbers in verbatim. As the SSD market continue to grow, the responsibility lies on both side of the fence – the vendor and the customer.


SSDs rising in the flood crisis

The Thailand flood last year spelled disaster to the storage industry. We have already seen several big boys in the likes of HP, EMC and NetApp announcing the rise of prices because of the flood.

NetApp’s announcement is here; EMC is here; and HP is here, if you want to read about it. Below is a nice and courteous EMC letter to their customers.

But the Chinese character of “crisis” (below) also spells opportunities; opportunities for Solid State Drives (SSDs) that is.

For those of us close to the ground, the market for spinning hard disk drives (HDDs) has certainly been challenging for the past few months, especially for smaller system providers like us. Without the leveraging powers of the bigger boys, we practically had to beg to buy HDDs, not to mention the fact that the price has practically doubled.

Before the Thailand flood crisis, the GB/$ of a 2TB HDD was 0.325 Malaysian ringgit per GB. That’s about 33 cents. Today, the price is about 55 cents per GB. In comparison, at least from my experience, the GB/$ of SSDs has gone down from $5.83 to $4.99.

I know some of you might pooh-pooh the price difference between a 2TB SATA/SAS and a 120GB SSD, partly because the SSD seems so expensive. But when you consider that doing the math, the SSDs is likely to be 50x faster (at worst average) and 200x faster (at best average) for applications requiring IOPS, this could mean that transactional applications are likely to be completed an average of 100x faster, with better response time, with lower latency. This will have a domino effect on other related applications, making the entire service request performing and completing faster. When we put a price to the transactional hours, for example $10/hour work, then we can see the cost savings coming from using SSDs in the storage.

Interestingly, a friend of mine asked me about the prominence of an all SSDs storage systems. I have written about all SSDs systems in the past, and also did a high overview of Pure Storage some time back. And a very interesting fact I recalled was these systems having massive amount of IOPS. Having plenty of IOPS helps because you do away with Automated Storage Tiering (AST) because you don’t have to tier your data, and you don’t have to pay for such a feature.

Yes, all-SSDs pure-play storage systems are gaining prominence and it’s time to take notice. Nimbus beat NetApp and HP 3PAR last year to win eBay with an all SSDs storage solution and other players such as Violin Memory Systems, Pure Storage, SolidFire and of course, Texas Memory Systems (aka RAMSAN). And they are attracting big names into their management portfolios and getting VC dollars of course.

The Thailand flood aftermath will probably take 6 months or more to return to its previous production capacity prior to the crisis and SSDs can take this window of opportunity in the crisis to surge ahead. And if this flood is going to be an annual thing for Thailand (God bless Thailand), HDD market is going to have a perennial problem. And SSDs is going to rise even faster.


Apple chomps Anobit

A few days ago, Apple paid USD$500 million to buy an Israeli startup, Anobit, a maker of flash storage technology.

Obviously, one of the reasons Apple did so is to move up a notch to differentiate itself from the competition and positions itself as a premier technology innovator. It has won the MP3 war with its iPod, but in the smartphones, tablets and notebooks space, Apple is being challenged strongly.

Today, flash storage technology is prevalent, and the demand to pack more capacity into a small real-estate of flash will eventually lead to reliability issues. The most common type of NAND flash storage is the MLC (multi-level cells) versus the more expensive type called SLC (single level cells).

But physically and the internal-build of MLC and SLC are the exactly the same, except that in SLC, one cell contains 1 bit of data. Obviously this means that 2 or more bits occupy one cell in MLC. That’s the only difference from a physical structure of NAND flash. However, if you can see from the diagram below, SLCs has advantages over MLCs.


NAND Flash uses electrical voltage to program a cell and it is always a challenge to store bits of data in a very, very small cell. If you apply too little voltage, the bit in the cell does not register and will result in something unreadable or an error. If you apply too much voltage, the adjacent cells are disturbed and resulting in errors in the flash. Voltage leak is not uncommon.

The demands of packing more and more data (i.e. more bits) into one cell geometry results in greater unreliability. Though the reliability of  the NAND Flash storage is predictable, i.e. we would roughly know when it will fail, we will eventually reach a point where the reliability of MLCs will no longer be desirable if we continue the trend of packing more and more capacity.

That’s when Anobit comes in. Anobit has designed and implemented architectural changes of the way NAND Flash storage is used. The technology in laymen terms comes in 2 stages.

  1. Error reduction – by understanding what causes flash impairment. This could be cross-coupling, read disturbs, data retention impairments, program disturbs, endurance impairments
  2. Error Correction and Signal Processing – Advanced ECC (error-correcting code), and introducing the patented (and other patents pending) Memory Signal Processing (TM) to improve the reliability and performance of the NAND Flash storage as show in the diagram below:

In a nutshell, Anobit’s new and innovative approach will result in

  • More reliable MLCs
  • Better performing MLCs
  • Cheaper NAND Flash technology

This will indeed extend the NAND Flash technology into greater innovation of flash storage technology in the near future. Whatever Apple will do with Anobit’s technology is anybody’s guess but one thing is certain. It’s going to propel Apple into newer heights.

Betcha don’t encrypt your disks

At the Internet Alliance event this morning, someone from Computerworld gave me a copy of their latest issue. The headline was “Security Incidents Soar”, with the details of the half-year review by CyberSecurity Malaysia.

Typically, the usual incidents list evolve around spam, intrusions, frauds, viruses and so on. However, storage always seems to be missing. As I see it, storage security doesn’t sit well with the security guys. In fact, storage is never the sexy thing and it is usually the IPS, IDS, anti-virus and firewall that get the highlights. So, when we talk about storage security, there is so little to talk about. In fact, in my almost 20-years of experience, storage security was only brought up ONCE!

In security, the most valuable piece of asset is data and no matter where the data goes, it always lands on …. STORAGE! That is why storage security could be one of the most overlooked piece in security. Fortunately, SNIA already has this covered. In SNIA’s Solid State Storage Initiative (SSSI), one aspect that was worked on was Self Encrypted Drives (SED).

SED is not new. As early as 2007, Seagate already marketed encrypted hard disk drives. In 2009, Seagate introduced enterprise-level encrypted hard disk drives. And not surprisingly, other manufacturers followed. Today, Hitachi, Toshiba, Samsung, and Western Digital have encrypted hard disk drives.

But there were prohibitive factors that dampened the adoption of self-encrypted drives. First of all, it was the costs. It was expensive a few years ago. There was (and still is) a lack of knowledge between the hardware of Self Encrypted Drives (SED) and software-based encryption. As the SED were manufactured, some had proprietary implementations that did not do their part to promote the adoption of SEDs.

As data travels from one infrastructure to another, data encryption can be implemented at different points. As the diagram below shows,


encryption can be put in place at the software level, the OS level, at the HBA, the network itself. It can also happen at the switch (network or fabric), at the storage array controller or at the hard disk level.

EMC multipathing software, PowerPath, has an encryption facility to ensure that data is encryption on its way from the HBA to the EMC CLARiiON storage controllers.

The “bump-in-the-wire” appliance is a bridge device that helps in composing encryption to the data before it reaches the storage. Recall that NetApp had a FIPS 140 certified product called Decru DataFort, which basically encrypted NAS and SAN traffic en-route to the NetApp FAS storage array.

And according to SNIA SSSI member, Tom Coughlin, SED makes more sense that software-based security. How does SED work?

First of all, SED works with 2 main keys:

  • Authentication Key (AK)
  • Drive Encryption Key (DEK)

The DEK is the most important component, because it is a symmetric key that encrypts and decrypts data on the HDDs or SSDs. This DEK is not for any Tom (sorry Tom), Dick and Harry. In order to gain access to DEK, one has to be authenticated and the authentication is completed by having the right authentication key (AK). Usually the AK is based on a 128/26-bit AES or DES and DEK is of a higher bit range. The diagram below shows the AK and DEK in action:

Because SED occurs at the drive level, it is significantly simpler to implement, with lower costs as well. For software-based encryption, one has to set up some form of security architecture. IPSec comes to mind. This is not only more complex, but also more costly to implement as well. Since it is software, the degree of security compromise is higher, meaning, the security model is less secure when compared to SED. The DEK of the SED does not leave the array, and if the DEK is implemented within the disk enclosure or the security module of SoC (System-on-Chip), this makes even more secure that software-based encryption. Also, the DEK is away from the CPU and memory, thus removing these components as a potential attack vendor that could compromise the data on the disks drives.

Furthermore, software-based encryption takes up CPU cycles, thus slows down the overall performance. In the Tom Coughlin study, based on both SSDs and HDDs, the performance of SED outperforms software-based encryption every time. Here’s a table from that study:

Another security concern is about data erasure. According to an old IBM study, about 90% of the retired HDDs still has data that is readable. That means that data erasure techniques used are either not implemented properly or simply not good enough. For us in the storage industry, an effective but time consuming technique is to overwrite the entire disks with 1s and reusing it. But to hackers, there are ways to “undelete” these bits and make the data readable again.

SED provides crypto erasure that is both effective and very quick. Since the data encryption key (DEK) was used to encrypt and decrypt data, the DEK can be changed and renewed in split seconds, making the content of the disk drive unreadable. The diagram below shows how crypto erasure works:

Data security is already at its highest alert and SEDs are going to be a key component in the IT infrastructure. The open and common standards are coming together, thanks to efforts to many bodies including SNIA. At the same time, product certifications are coming up and more importantly, the price of SED has come to the level that it is almost on par with normal, non-encrypted drives.

Hackers and data thieves are getting smarter all the time and yet, the security of the most important place of where the data rest is the least considered. SNIA and other bodies hope to create more awareness and seek greater adoption of self encrypted drives. We hope you will help spread the word too. Betcha thinking twice now about encrypting your data  on your disk drives now.