I have taken some downtime from my blog since late October. Part of my “hiatus” was my illness which had affected my right kidney but I am happy to announce that I am well again. During this period, I spent a lot of the time reading the loads of storage technologies announcements and their marketing calls and almost every single one of them touts Performance as if it is the single “sellable” feature of the respective storage vendor. None ever positions data integrity and the technology behind it in what I believe as the most important and most fundamental feature of any storage technology – Reading the right data exactly it was written into the storage array.
[ Note: Data integrity is even more critical in cloud storage and data corruption, especially the silent ones are even more acute in the clouds ]
Sure, this fundamental feature sounds like it is a given thing in any storage array but believe me, there are enterprise storage arrays which have failed to deliver this simple feature properly. I have end users coming to me through out my storage career that they have database corruption, or file corruption and unable to access their data in an acceptable manner. Data corruption is real folks!
After several weeks of reading these stuff, I got jaded with so many storage vendors playing leapfrog announcements with their millions of IOPS boasts.
The 3 legged stool
Rewind to circa 2012, just about the time when EMC® acquired XtremIO™. XtremIO™ was a nascent All-Flash startup, and many, including yours truly, really saw the EMC® acquisition was about a high performant storage array. I was having an email conversation with Shahar Frank, one of the co-founders of XtremIO™, and expressing my views about their performance. What Shahar replied surprised me.
The fundamentals of the strength of a storage array was a like a 3-legged stool. 2 legs of the stool would be Performance, and Protection, but with 2 legs, the person sitting on the stool would fall. The 3rd leg would stabilize the balance of the stool, and this 3rd leg was Reliability. This stumped me because XtremIO™’s most sellable feature was Performance. But the wisdom of Shahar pointed to Reliability, the least exciting feature and the most dull of the 3. He was brilliant, of course and went on to found ElastiFile (acquired by Google™), but that’s another story for another day.
Reliability is Trust in OpenZFS
Picking up the architecture in file systems is passion of mine. The Reliability aspect of the OpenZFS is one of the things which endears me much to the best, IMHO, file system in the industry today. And the OpenZFS feature that best exemplifies Reliability is Data Integrity Verification and Automatic Repair feature, a.k.a. “Self Healing“. Let us revisit this most fundamental (often dull) feature that underpins the OpenZFS file system, and the storage arrays that operate with it.
- In 2011, I wrote a raw piece detailing ONTAP™ vs ZFS
- Last month, I wrote about some compelling new features of OpenZFS 2.0 exciting new future
OpenZFS checksums every data block and the metadata to the data block. The checksumming is end-to-end, meaning that it provides a more superior data integrity framework compared to many other RAID implementations and file systems. Copying in verbatim from the bullet points in the OpenZFS Github page, the strengths of end-to-end checksum are:
- detects data corruption upon reading from media
- blocks that are detected as corrupt are automatically repaired if possible, by using the RAID protection in suitably configured pools, or redundant copies
- periodic scrubs can check data to detect and repair latent media degradation (bit rot) and corruption from other sources
- checksums on ZFS replication streams, zfs send and zfs receive, ensure the data received is not corrupted by intervening storage or transport mechanisms
Whenever data is written to the OpenZFS file system, a checksum is created for the data and the metadata of the data block. The metadata and the checksum are part of the block pointer in the file system tree. When the written data block is read, the checksum of the data block is read and compared with earlier written checksum. If the said checksum is bad, that means that the data in the data block is erroneous, thus corrupted.
In a mirrored (RAID-1) or a parity RAID (RAID-Z1/Z2) configuration, OpenZFS will read another data block copy of the data block read earlier. If this data block returns a good checksum, the data is good. This good data is used to supplant the bad data, repairing the corrupted data. This self-healing feature is in action in the diagram above.
Instead of Performance, why not Reliability?
Every storage technology and every file system has reliability as the #1 feature baked in. The self-healing feature I described about OpenZFS is not unique either because every storage vendor and storage project has its own version of checksum validation and automatic repair.
But what irked me was the lack of promotion of the Reliability feature, especially explaining from the perspective of Data Integrity. I have put forth a deeper dive of how OpenZFS, using checksum validation and its unique RAID implementation, to repair bad data and strengthen Data Integrity.
I am hoping every storage vendor would come up to share in more details of how they do self-healing and how data integrity is preserved in their storage OS and/or file system. In a 2-way street, storage architects, cloud storage engineers and practitioners should demand to know too, instead of just accepting the marketing words and simple brushoffs by pre-sales guys who just regurgitate what are on their slide decks, especially in Asia.
We can definitely appreciate this favour, instead of just buying into “millions, and gazillions of IOPS” all the time.