Nimbus beats NetApp at Ebay – The details and the conspiracy of vendors

In my last entry, I mentioned that Nimbus has now 100TB in eBay and every single TB of it is on SSDs. The full details of how the deal was trashed out are detailed here, beating the competition from NetApp and 3PAR, the incumbents.

The significance of the deal was how a full SSDs system was able to out-price the storage arrays with a hybrid of spinning disks and SSDs.

The Nimbus news just obliterated the myth that SSDs are expensive. If you do the math, perhaps the price of the entire storage systems is not the SSDs. It could be the way some vendors structure their software licensing scheme or a combination of license, support and so on.

Just last week, we were out there discussing about hard disks and SSDs. The crux of the discussion was around pricing and the customer we were speaking too was perplexed that  the typical SATA disks from vendors such as HP, NetApp and so on cost a lot more than the Enterprise HDDs and SSDs you get from the distributors. Sometimes it is a factor of 3-4x.

I was contributing my side of the story that one unit of 1TB SATA (mind you, this is an Enterprise-grade HDD from Seagate) from a particular vendor would cost about RM4,000 to RM5,000. The usual story that we were trained when we worked for vendors was, “Oh, these disks had to be specially provisioned with our own firmware, and we can monitor their health with our software and so on ….”. My partner chipped in and cleared the BS smoke screen and basically, the high price disks comes with high margins for the vendor to feed the entire backline of the storage product, from sales, to engineers, to engineering and so on. He hit the nail right on the head because I believe a big part of the margin of each storage systems goes back to feed the vendor’s army of people behind the product.

In my research, a 2TB enterprise-grade SATA HDDs in Malaysia is approximately RM1,000 or less. Similar a SAS HDDs would be slightly higher, by 10-15%, while an enterprise-grade SSD is about RM3,000 or less. And this is far less than what is quoted by the vendors of storage arrays.

Of course, the question would be, “can the customer put in their own hard disks or ask the vendor to purchase cheaper hard disks from a cheaper source?” Apparently not! Unless you buy low end NAS from the likes of NetGear, Synology, Drobo and many low-end storage systems. But you can’t bet your business and operations on the reliability of these storage boxes, can you? Otherwise, it’s your head on the chopping block.

Eventually, the customers will demand such a “feature”. They will want to put in their own hard disks (with proper qualification from the storage vendor) because they will want cheaper HDDs or SSDs. It is already happening with some enterprise storage vendors but these vendors are not well known yet. It is happening though. I know of one vendor in Malaysia who could do such a thing …

SSDs coming into mainstream … be Ready!

There has been a slew of SSD news in the storage blogosphere with the big one from eBay.

eBay has just announced that it has 100TB of SSDs from Nimbus Data Systems. On top of that, OCZ, SanDisk and STEC, all major SSD manufacturers, have announced a whole lot of new products with the PCIe SSD cards leading the way. The most interesting thing was the factor of $/GB has gone down significantly, getting very close to the $/GB of spinning disks. This is indeed good news to the industry because SSDs delivers low latency, high IOPS, low power consumption and many other new benefits.

Side note: As I am beginning to understand more about SSDs, I found out that NAND flash SSD has a latency in the microseconds compared to spinning HDDs, which has milliseconds latency range. In addition to that DRAM SSDs have latency that is in the range of nano seconds, which is basically memory type of access. DRAM SSDs are of course, more expensive. 

The SSDs are coming very soon into the mainstream, and this will inadvertently, drive a new generation of applications and accelerate growth in knowledge acquisition. We are already seeing the decline of Fibre Channel disks and the rise of SAS and SATA disks but SSDs in the enterprise storage, as far as I am concerned, brings forth 2 new challenges which we, as professionals and users in the storage networking environment, must address.

These challenges can be simplified to

  1. Are we ready?
  2. Where is the new bottleneck?

To address the first challenge, we must understand the second challenge first.

In system architectures, we know of various of performance bottlenecks that exist either in CPU, memory, bus, bridge, buffer, I/O devices and so on. In order to deliver the data to be process, we have to view the data block/byte service request in its entirety.

When a user request for a file, this is a service request. The end objective is the user is able to read and write the file he/she requested. The time taken from the beginning of the request to the end of it, is known as service time, which latency plays a big part of it. We assume that the file resides in a NAS system in the network.

The request for the file begins by going through the file system layer of the host the user is accessing, then to the user and kernel space, moving on through the device driver of the NIC card, through the TCP/IP stack (which has its own set of buffer overheads and so on), passing the request through the physical wire. From there it moves on through the NAS system with the RAID system, file system and so on until it reaches the file request. Note that I have shortened the entire process for simple explanation but it shows that the service request passes through a whole lot of things in order to complete the request.

Bottlenecks exist everywhere within the service request path and is also subjected to external factors related to that service request. For a long, long time, I/O has been biggest bottleneck to the processing of the service request because it is usually and almost always the slowest component in the entire scheme of things.

The introduction of SSDs will improve the I/O performance tremendously, into the micro- or even nano-seconds range, putting it in almost equal performance terms with other components in the system architecture. The buses and the bridges in the computer systems could be the new locations where the bottleneck of a service request exist. Hence we have use this understanding to change the modus operandi of the existing types of applications such as databases, email servers and file servers.

The usual tried-and-tested best practices may have to be changed to adapt to the shift of the bottleneck.

So, we have to equip ourselves with what SSDs is doing and will do to the industry. We have to be ready and take advantage of this “quiet” period to learn and know more about SSD technology and what the experts are saying. I found a great website that introduces and speaks about SSD in depth. It is called StorageSearch and it is what I consider the best treasure trove on the web right now for SSD information. It is run by a gentleman named Zsolt Kerekes. Go check it out.

Yup, we must be get ready when SSDs hit the mainstream, and ride the wave.



Can snapshots replace traditional backups?

Backup is necessary evil. In IT, every operator, administrator, engineer, manager, and C-level executive knows that you got to have backup. When it comes to the protection of data and information in a business, backup is the only way.

Backup has also become the bane of IT operations. Every product that is out there in the market is trying to cram as much production data to backup as possible just to fit into the backup window. We only have 24 hours in a day, so there is no way the backup window can be increased unless

  • You reduce the size of the primary data to be backed up – think compression, deduplication, archiving
  • You replicate the primary data to a secondary device and backup the secondary device – which is ironic because when you replicate, you are creating a copy of the primary data, which technically is a backup. So you are technically backing up a backup
  • You speed up the transfer of primary data to the backup device

Either way, the IT operations is trying to overcome the challenges of the backup window. And the whole purpose for backup is to be cock-sure that data can be restored when it comes to recovery. It’s like insurance. You pay for the premium so that you are able to use the insurance facility to recover during the times of need. We have heard that analogy many times before.

On the flip side of the coin, a snapshot is also a backup. Snapshots are point-in-time copies of the primary data and many a times, snapshots are taken and then used as the source of a “true” backup to a secondary device, be it disk-based or tape-based. However, snapshots have suffered the perception that it is a pseudo-backup, until recent last couple of years.

Here are some food for thoughts …

WHAT IF we eliminate backing data to a secondary device?

WHAT IF the IT operations is ready to embrace snapshots as the true backup?

WHAT IF we rely on snapshots for backup and replicated snapshots for disaster recovery?

First of all, it will solve the perennial issues of backup to a “secondary device”. The operative word here is the “secondary device”, because that secondary device is usually external to the primary storage.

Tape subsystems and tape are constantly being ridiculed as the culprit of missing backup windows. Duplications after duplications of the same set of files in every backup set triggered the adoption of deduplication solutions from Data Domain, Avamar, PureDisk, ExaGrid, Quantum and so on. Networks are also blamed because network backup runs through the LAN. LANless backup will use another conduit, usually Fibre Channel, to transport data to the secondary device.

If we eliminate the “secondary device” and perform backup in the primary storage itself, then networks are no longer part of the backup. There is no need for deduplication because the data could already have been deduplicated and compressed in the primary storage.

Note that what I have suggested is to backup, compress and dedupe, AND also restore from the primary storage. There is no secondary storage device for backup, compress, dedupe and restore.

Wouldn’t that paint a better way of doing backup?

Snapshots will be the only mechanism to backup. Snapshots are quick, usually in minutes and some in seconds. Most snapshot implementations today are space efficient, consuming storage only for delta changes. The primary device will compress and dedupe, depending on the data’s characteristics.

For DR, snapshots are shipped to a remote storage of equal prowess at the DR site, where the snapshot can be rebuild and be in a ready mode to become primary data when required. NetApp SnapVault is one example. ZFS snapshot replication is another.

And when it comes to recovery, quick restores of primary data will be from snapshots. If the primary storage goes down, clients and host initiators can be rerouted quickly to the DR device for services to resume.

I believe with the convergence of multi-core processing power, 10GbE networks, SSDs, very large capacity drives, we could be seeing a shift in the backup design model and possible the entire IT landscape. Snapshots could very likely replace traditional backup in the near future, and secondary device may be a thing of the past.

Solid State Drives … are they reliable?

There’s been a lot of questions about Solid State Drives (SSD), aka Enterprise Flash Drives (EFD) by some vendors. Are they less reliable than our 10K or 15K RPM hard disk drives (HDDs)? I was asked this question in the middle of the stage when I was presenting the topic of Green Storage 3 weeks ago.

Well, the usual answer from the typical techie is … “It depends”.

We all fear the unknown and given the limited knowledge we have about SSDs (they are fairly new in the enterprise storage market), we tend to be drawn more to the negatives than the positives of what SSDs are and what they can be. I, for one, believe that SSDs have more positives and over time, we will grow to accept that this is all part of what the IT evolution. IT has always evolved into something better, stronger, faster, more reliable and so on. As famously quoted by Jeff Goldblum’s character Dr. Ian Malcolm, in the movie Jurassic Park I, “Life finds a way …”, IT will always find a way to be just that.

SSDs are typically categorized into MLCs (multi-level cells) and SLCs (single-level cells). They have typically predictable life expectancy ranging from tens of thousands of writes to more than a million writes per drive. This, by no means, is a measure of reliability of the SSDs versus the HDDs. However, SSD controllers and drives employ various techniques to enhance the durability of the drives. A common method is to balance the I/O accesses to the disk block to adapt the I/O usage patterns which can prolong the lifespan of the disk blocks (and subsequently the drives itself) and also ensure performance of the drive does not lag since the I/O is more “spread-out” in the drive. This is known as “wear-leveling” algorithm.

Most SSDs proposed by enterprise storage vendors are MLCs to meet the market price per IOP/$/GB demand because SLC are definitely more expensive for higher durability. Also MLCs have higher BER (bit-error-rate) and it is known than MLCs have 1 BER per 10,000 writes while SLCs have 1 BER per 100,000 writes.

But the advantage of SSDs clearly outweigh HDDs. Fast access (much lower latency) is one of the main advantages. Higher IOPS is another one. SSDs can provide from several thousand IOPS to more than 1 million IOPS when compared to enterprise HDDs. A typical 7,200 RPM SATA drive has less than 120 IOPS while a 15,000 RPM Fibre Channel or SAS drive ranges from 130-200 IOPS. That IOPS advantage is definitely a vast differentiator when comparing SSDs and HDDs.

We are also seeing both drive-format and card-format SSDs in the market. The drive-format type are typically in the 2.5″ and 3.5″ profile and they tend to fit into enterprise storage systems as “disk drives”. They are known to provide capacity. On the other hand, there are also card-format type of SSDs, that fit into a PCIe card that is inserted into host systems. These tend to address the performance requirement of systems and applications. The well known PCIe vendors are Fusion-IO which is in the high-end performance market and NetApp which peddles the PAM (Performance Access Module) card in its filers. The PAM card has been renamed as FlashCache. Rumour has it that EMC will be coming out with a similar solution soon.

Another to note is that SSDs can be read-biased or write-biased. Most SSDs in the market tend to be more read-biased, published with high read IOPS, not write IOPS. Therefore, we have to be prudent to know what out there. This means that some solution, such as the NetApp FlashCache, is more suitable for heavy-read I/O rather than writes I/O. The FlashCache addresses a large segment of the enterprise market because most applications are heavy on reads than writes.

SSDs have been positioned as Tier 0 layer in the Automated Storage Tiering segment of Enterprise Storage. Vendors such as Dell Compellent, HP 3PAR and also EMC FAST2 position themselves with enhanced tiering techniques to automated LUN and sub-LUN tiering and customers have been lapping up this feature like little puppies.

However, an up-and-coming segment for SSDs usage is positioning the SSDs as extended read or write cache to the existing memory of the systems. NetApp’s Flashcache is a PCIe solution that is basically an extended read cache. An interesting feature of Oracle Solaris ZFS called Hybrid Storage Pool allows the creation of read and write cache using SSDs. The Sun fellas even come up with cool names – ReadZilla and LogZilla – for this Hybrid Storage Pool features.

Basically, I have poured out what I know about SSDs (so far) and I intend to learn more about it. SNIA (Storage Networking Industry Association) has a Technical Working Group for Solid State Storage. I advise the readers to check it out.