Boosting Solid States beyond SATA

Lately, I have been getting deeper and deeper into low-level implementation related to storage technologies. In my previous blog, I was writing my learning adventure with Priority Flow Control (PFC) and intend to further the Data Center Bridging concepts with future blog entries.

Before I left for Sydney for a holiday last week, I got sidetracked into exciting stuff that’s happening in my daily encounters with friends and new friends. 2 significant storage related technologies fell onto my lap. One is NVMe (Non-Volatile Memory express) and the other FPGA (Field Programmable Gate Array).

While this blog is going to be about NVMe, I actually found FPGA much more exciting to me. Through conversations, I found that there are 2 “biggies” in the FPGA world, and they are designed and manufactured by Xilink and Altera. I admit that I have not done my homework on FPGA yet, having just returned from Sydney last night. I will blog about FPGA in future blogs.

But NVMe is also an important technology direction to the storage world as well.

I think most of us are probably already mesmerized by solid state drives. The bombardment of marketing, presentations, advertising and whatever else the vendors do to promote (and self-promote) solid state drives are inundating the intellectual senses of consumers and enterprises alike. And yet, many vendors do not explain both the pros and cons of integrating solid states into their IT environment. Even worse, many don’t even know the strengths and weaknesses of solid states, hence creating some exaggeration that continues to create a spiral vortex of inaccuracies. Like a self-feeding frenzy, the industry seems to have placed solid state storage as the saviour of the enterprise storage world. Go figure with that!

Continue reading

Silent Data Corruption (SDC) …it’s more prevalent that you think

Have you heard about Silent Data Corruption (SDC)? It’s everywhere and yet in the storage networking world, you can hardly find a storage vendor talking about it.

I did a paper for MNCC (Malaysian National Computer Confederation) a few years ago and one of the examples I used was what they found at CERN. CERN, the European Center for Nuclear Research published a paper in 2007 describing the issue of SDC. Later in 2008, they found approximately 38,000 files were corrupted in the 15,000TB of data they generated. Therefore SDC is very real and yet to the people in the storage networking industry, where data matters the most, it is one of the issues that is the least talked about.

What is Silent Data Corruption? Every computer component that we use is NOT perfect. It could be the memory; it could be the network interface cards (NICs); it could be the hard disk; it could also be the bus, the file system, the data block structure. Any computer component, whether it is hardware or software, which deals with the bits of data is subjected to the concern of SDC.

Data corruption happens all the time. It is when a bit or a set of bits is changed unintentionally due to various reasons. Some of the reasons are listed below:

  • Hardware errors
  • Data transfer noise
  • Electromagnetic Interference (EMI)
  • Firmware bugs
  • Software bugs
  • Poor electrical current distribution
  • Many more …

And that is why there are published statistics for some hardware components such as memory, NICs, hard disks, and even protocols such as Fibre Channel. These published statistics talk about BER or bit-error-rate, which is the occurrence of an erroneous bit in every billion or trillion of bits transferred or processed.

And it is also why there are inherent mechanisms within these channels to detect data corruption. We see them all the time in things such as checksums (CRC32, SHA1, MD5 …), parity and ECC (error correction code). Because we can detect them, we see errors and warnings about their existence.

However, SILENT data corruption does not appear as errors and warnings, and they do OCCUR! And this problem is getting more and more prevalent in modern day disk drives, especially solid state drives (SSDs). As the disk manufacturers are coming out with more compact, higher capacity and performance drives, the cell geometry of SSDs are becoming smaller and smaller. This means each cell will have a smaller area to contain the electrical charge and maintain the bit-value, either a -0 or -1. At the same time, the smaller cell is more sensitive and susceptible to noise, electrical charge leakage and interference of nearby cells as some SSDs has different power modes to address green requirements.

When such things happen, a 0 can look like a 1 or vice versa and if the error is undetected, this becomes silent data corruption.

Most common storage networking technology such as RAID or file systems were introduced during the 80’s or 90’s when disks were 9GB, 18GB and so on, and FastEthernet was the standard for networking. Things have changed at a very fast pace, and data growth has been phenomenal. We need to look at storage vendors’ technology more objectively now and get more in-depth about issues such as SDC.

SDC is very real but until and unless we learn and equip ourselves with the knowledge, just don’t take things from vendors verbatim. Find out … and be in control of what you are putting into your IT environment.