Silent Data Corruption (SDC) …it’s more prevalent that you think

Have you heard about Silent Data Corruption (SDC)? It’s everywhere and yet in the storage networking world, you can hardly find a storage vendor talking about it.

I did a paper for MNCC (Malaysian National Computer Confederation) a few years ago and one of the examples I used was what they found at CERN. CERN, the European Center for Nuclear Research published a paper in 2007 describing the issue of SDC. Later in 2008, they found approximately 38,000 files were corrupted in the 15,000TB of data they generated. Therefore SDC is very real and yet to the people in the storage networking industry, where data matters the most, it is one of the issues that is the least talked about.

What is Silent Data Corruption? Every computer component that we use is NOT perfect. It could be the memory; it could be the network interface cards (NICs); it could be the hard disk; it could also be the bus, the file system, the data block structure. Any computer component, whether it is hardware or software, which deals with the bits of data is subjected to the concern of SDC.

Data corruption happens all the time. It is when a bit or a set of bits is changed unintentionally due to various reasons. Some of the reasons are listed below:

  • Hardware errors
  • Data transfer noise
  • Electromagnetic Interference (EMI)
  • Firmware bugs
  • Software bugs
  • Poor electrical current distribution
  • Many more …

And that is why there are published statistics for some hardware components such as memory, NICs, hard disks, and even protocols such as Fibre Channel. These published statistics talk about BER or bit-error-rate, which is the occurrence of an erroneous bit in every billion or trillion of bits transferred or processed.

And it is also why there are inherent mechanisms within these channels to detect data corruption. We see them all the time in things such as checksums (CRC32, SHA1, MD5 …), parity and ECC (error correction code). Because we can detect them, we see errors and warnings about their existence.

However, SILENT data corruption does not appear as errors and warnings, and they do OCCUR! And this problem is getting more and more prevalent in modern day disk drives, especially solid state drives (SSDs). As the disk manufacturers are coming out with more compact, higher capacity and performance drives, the cell geometry of SSDs are becoming smaller and smaller. This means each cell will have a smaller area to contain the electrical charge and maintain the bit-value, either a -0 or -1. At the same time, the smaller cell is more sensitive and susceptible to noise, electrical charge leakage and interference of nearby cells as some SSDs has different power modes to address green requirements.

When such things happen, a 0 can look like a 1 or vice versa and if the error is undetected, this becomes silent data corruption.

Most common storage networking technology such as RAID or file systems were introduced during the 80’s or 90’s when disks were 9GB, 18GB and so on, and FastEthernet was the standard for networking. Things have changed at a very fast pace, and data growth has been phenomenal. We need to look at storage vendors’ technology more objectively now and get more in-depth about issues such as SDC.

SDC is very real but until and unless we learn and equip ourselves with the knowledge, just don’t take things from vendors verbatim. Find out … and be in control of what you are putting into your IT environment.

Virtualization and cloud aren’t what they are without storage

I was chatting with a friend yesterday and we were discussing about virtualization and cloud, the biggest things that are happening in the IT industry right now. We were talking about the VMware vSphere 5 arrival, the cool stuff VMware is bringing into the game, pushing the technology juggernaut farther and farther ahead of its rivals Hyper-V, Xen and Virtual Box.

And in the technology section of the newspaper yesterday, I saw news of Jaring OneCloud offering and one of the local IT players just brought in Joyent. Fantastic stuff! But for us in IT, we have been inundated with cloud, cloud and more cloud. The hype, the fuzz and the reality. It’s all there but back to our conversation. We realized that virtualization and cloud aren’t much without storage, the cornerstone of virtualization and cloud. And in the storage networking layer, there are the data management piece, the information infrastructure piece and so on and yet … why are there so few storage networking professional out there in our IT scene.

I have been lamenting this for a long time because we have been facing this problem for a long time. We are facing a shortage of qualified and well experienced storage networking professionals. There are plenty of jobs out there but not enough resources to meet the demand. As SNIA Malaysia Chairman, it is my duty to work with my committee members of HP, IBM, EMC, NetApp, Symantec and Cisco to create the awareness, and more importantly the passion to get the local IT’s storage networking professional voice together. It has been challenging but my advice to all those people out there – “Why be ordinary when you can become extra-ordinary?”

We have to make others realize that storage networking is what makes virtualization and cloud happen. Join us at SNIA Malaysia and be part of something extra-ordinary. Storage networking IS the foundation of virtualization and cloud. You can’t exclude it.

10Gigabit Ethernet will rule

As far as how the next generation storage networks would look like, 10Gigabit Ethernet (10GbE) is definitely the strongest candidate for the storage network. And this is made possible with key enhancements to Ethernet that has made it possible for greater reliability and performance. This enhancement goes by several names such as Data Center Ethernet (a term coined by Cisco) and Converged Enhanced Ethernet (CEE). But probably the more widely use term is DCB or Data Center Bridging.

Ethernet, so far, has never failed to deliver and as far as I am concerned, Ethernet will rule for the next 10 years or more. Ethernet has evolved several generations from Ethernet running at 10Mbits/sec to FastEthernet, then Gigabit Ethernet and now 10Gigabit Ethernet. Pretty soon, it will be looking at 40Gbits/sec and 100Gbits/sec. It is a tremendous piece of protocol, allowing it to evolve and adapt to the modern data networks.

But before 10GbE, the delivery of packets were of best effort basis. But today’s networks demand scalability, security, performance and most of reliability. However, since the advent of DCB, 10GbE is fortified with these key technologies

  • 10GBASE-T – using Cat 6/6A cabling standards, 10GBASE-T delivers low cost, simple UTP (unshielded twisted pair) networking to the masses
  • iWARP – Support for iWARP is crucial for RDMA (Remote Direct Memory Access). RDMA, in a nutshell, reduces overhead of typical networking buffer-to-buffer copy, by bypassing these bottlenecks, and placing the data blocks and its bits/bytes directly into the access points of the corresponding requesting node.
  • Low latency cut-switching at Layer 2 by reading just the header of the packet instead of the entire full length of the packet. The information contained in the header of the packet is sufficient for it to make a switching/forwarding decision
  • Energy Efficient by introducing low power idle state and other implementations which makes the power consumption usage more proportional to the network utilization rate
  • Congestion notification and pause frame which handles 8 different classes of traffic to ensure lossless network delivery
  • Shortest path adaptive routing protocol for Ethernet forwarding. TRILL (Transparent Interconnections with Lots of Links) is one of the implementation. Lately OpenFlow has been jumping into the bandwagon as a viable option but I need to check out OpenFlow support with 10GbE and DCB.
  • FCoE (Fibre Channel over Ethernet) is all the rage these days and 10GbE has the ability to carry Fibre Channel traffic. This has sparked a initial frenzy among storage vendors.

Of course, last but not least, we are already seeing the sunset of Fibre Channel. While 8Gbps FC has been out for a while, its adoption rate seemed to have stalled. Many vendors and customers are at the 4Gbps range, adopting a wait-and-see game. 16Gbps FC has been in the talks but it seems that all the fireworks are with 10Gigabit Ethernet right now. It will rule …

Dell acquiring Force10

What do you think of Dell acquiring Force10? My first reaction was surprise, very surprised.

I was in the middle of a conversation with a friend when the RSS feed popped up in front of me – “Dell acquiring Force10”! I cut that conversation short to read the rest of the details … wow, that’s a good buy!

With all the rumors flying around that Brocade was the most obvious choice, Force10 was out of the blue for me. As the euphoria settled down, I thought Dell had made a very smart move. Brocade, unfortunately, is still pretty much a Fibre Channel company, with 75% of its business relying heavily on Fibre Channel and FCoE. Even though Brocade has Foundry now, Brocade has not strongly asserted itself as an front runner and innovator of 10Gigabit Ethernet.

Meanwhile, Force10 has been a up-and-coming force (pun intended) to be reckon with, strengthening its position as a 10GbE player in the market. And with 10GbE now, and 40GbE or 100GbE coming in the next 2-3 years, Force10 will be riding the wave of the future. Dell can only benefit from that momentum.

Dell has been very, very aggressive to push itself into the enterprise storage space. From its acquisition of EqualLogic in 2007, to Exanet, Ocarina and Compellent last year, there is no doubt that Dell wants this space badly.

The first challenge for Dell is to put its story together and convince the customers that they are no longer Dell, the PC/laptop direct seller, but a formidable company capable of providing enterprise solutions, services and support.

The second challenge, and even bigger one, is itself; its culture of changing mindset. The game has changed; the rule has change. The enterprise is a totally different ballgame. Is Dell ready? Is Dell ready to change itself?

Maybe the Force(10) be with Dell!

Going Ga Ga over Storage Networking

Before you start thinking that I am ripping off Lady Gaga, this blog’s name of “Storage Gaga” is NOT from Lady Gaga. It’s from Queen’s Radio Ga Ga song which I happen to be listening in my car.

Why Ga Ga? Ga Ga in the Free Dictionary (link: http://www.thefreedictionary.com/gaga) means crazy over something (at least one of the meanings anyway). That’s what I am. Since leaving my last job – which was on Tuesday (July 19th 2011) this week – I want to do more for storage networking and data management. I want to share things I find out, information that I have learned and so on.

So watch this space for more info … more on the way.

p/s. This rainy morning, I am going to arrange and organize all my computer books. It’s going to be fun!