As Disk Drive capacity gets larger (and larger), the resilient Filesystem matters

I just got home from the wonderful iXsystems™ Sales Summit in Knoxville, Tennessee. The key highlight was to christian the opening of iXsystems™ Maryville facility, the key operations center that will house iX engineering, support and part of marketing as well. News of this can be found here.

iX datacenter in the new Maryville facility

Western Digital® has always been a big advocate of iX, and at the Summit, they shared their hard disk drives HDD, solid state drives SSD, and other storage platforms roadmaps. I felt like a kid a candy store because I love all these excitements in the disk drive industry. Who says HDDs are going to be usurped by SSDs?

Several other disk drive manufacturers, including Western Digital®, have announced larger capacity drives. Here are some news of each vendor in recent months

Other than the AFR (annualized failure rates) numbers published by Backblaze every quarter, the Capacity factor has always been a measurement of high interest in the storage industry.

Larger drives means …

Larger and larger HDDs present a far greater challenges in reading and writing the bits of the data accurately and with adequate disk access times. The disk platter count has bumped up to 10 platters from 9. For a 26TB HDD, the areal density is now at 2.6TB/platter. New technologies have to be introduced to push the drive capacities into almost ethereal state. In WD’s case, ePMR (energy assisted perpendicular magnetic recording), TSA (triple stage actuator) and helium-based HDDs with HelioSeal®, are among the key innovations.

The impact to Data Resiliency.

The demands of larger and larger capacity drives will need a better filesystem to serve data bits accurately. The data bits must be delivered with resiliency to recover from bit rots, and also with the precision and the accuracy in read and writing bits from and into tiniest real estate at a microscopic level on the disk platters.

Bit rot example

Applications and industries demand better data all the time. Thus, at the storage layer, after the data bits are addressed with CHS (cylinder head sector), enhanced with ADF (advanced disk format) to LBA (logical block addressing), the filesystem is the presentation layer that abstracts the complexities of where the data bits are at the very foundational level.

The choice of filesystem matters

There are plenty of volume managers, often for RAID purposes. There are also hundreds, if not thousands, of filesystems. RAID, whatever the level, serves to provide Cost, Performance and Availability (these are the terms I use). Regardless of software or hardware RAID, the technology provides data availability (mirrored and parity RAIDs), with some combined performance of IOPS and sequential throughput to sums up the service time and latency to the requesting agents.

The filesystem layer demystifies data bit from the upstream of the CHS and ADF disk drives and the LBA of the data blocks . It present the concept of folders and files with a logical structure and a hierarchy and a distribution method to serve the data. In summary, the filesystem made data humanly comprehensible.

Often the RAID function and the filesystem function are 2 different entities. Not many are both a volume manager and filesystem at the same time. Several filesystem platforms have stood out in this space, and the ZFS filesystem is one of the most outstanding ones that I know. And I also happened to work very closely with OpenZFS (sometimes ZFS on Oracle® Solaris™) every day for the past 12 years.

The ZFS filesystem

ZFS is a volume manager AND a filesystem. As a practitioner of storage and filesystems for 30 years, the simplicity and the elegance to have both RAID and a file manager in storage OS cannot be understated. Furthermore, at the command line level, ZFS works just as part of the operating system. File related commands such as mv, cp, ls, and many other just glide seamlessly for a system administrator, almost transcendence. At it is at this CLI level, the absolute strength and beauty of ZFS shines, especially for hardcore storage administrators who can appreciate what a great filesystem should be.

Here are a few features of ZFS filesystem related to resiliency.

128-bit (256 quadrillion zettabytes addressable storage – so needed for very, very, very large storage volumes)
Data integrity verification and automatic repair (self healing feature) through active checksum
Copy-on-write where every striped write in each transaction group (TXG) is atomic to ensure data consistency
RAID levels of mirror, Z1, Z2, Z3 for data availability. New RAIDs of dRAID and RAID expansion.
Near instantaneous snapshots creations and rollbacks. Scales to astronomical counts.

Here is a capacity limit segment of ZFS from the Wikipedia page.

ZFS 128-bit capacity limits

After Build 134 …

I remember Build 134 well. It was the last version of open source OpenSolaris before Oracle decided close source Solaris as a whole. But the movement of open sourced Solaris on x86 lived on through the Illumos project in the beginning. I wrote about it in 2012.

[ January 2012 ] Phoenix rising from OpenSolaris ashes

From Illumos, OpenZFS became a stand out project, and it has created many storage technologies projects. The most notable one was of course, FreeNAS™, now known as TrueNAS® CORE.

OpenZFS filesystem resiliency

Maybe I have not said this enough. The purpose of any filesystem is data integrity. The data written is exactly the data read. Just like that … forever as long as the filesystem lives. However, lost in the translation and the articulation of storage, many like to tout storage performance first, then the data protection techniques like snapshots and clones, and so on. Oh, we are the fastest. We can snapshot in seconds. Blah, blah, blah. The list goes on.

I have heard of filesystems fail. Even the enterprise-grade ones. And to mission critical operations, the storage filesystems that serve mission critical applications and workloads, the results are disastrous. A complete meltdown at a galaxy scale. That is why every bit of data in storage must be underpinned by the filesystem’s resiliency features, and its ability to preserve data integrity.

As the larger capacity drives enter the market, the role and the importance of the filesystem must be properly explained. It is absolutely paramount to select a good storage platform with a mature, well executed filesystem technology. In my experience, OpenZFS is the best of the best.

As disk capacities grow larger and larger

Here are a few parting thoughts:

Larger capacities HDDs mean large storage volumes. Can all filesystems address this new capacity scale?
Greater precision and data bits accuracy are now acutely significant in larger capacities HDDs. Can all filesystems provide stronger data integrity and resiliency?
Managing data at the zettabyte scale demands data availability, performance and as a cost effective solution with simplicity built-in. Can all filesystems delivery the very foundation of what a storage technology platform should be?

In my experience, having worked with many filesystems such as Lustre, WAFL™, ext 2/3/4, Veritas VxFS, FLARE and DART, HNAS, Unix UFS, NTFS, btrfs, FAT32, ExFAT, CephFS, BeeGFS, GlusterFS, Sistina GFS, LTFS, HDFS, Andrew File System (once), HFS/HFS+, APFS and countless others, OpenZFS is the one that stood out the most for me.

Yes, the filesystem of choice matters. And OpenZFS is integral in iXsystems™’ TrueNAS®.

2 Responses to As Disk Drive capacity gets larger (and larger), the resilient Filesystem matters

Alex Tien says:

May 30, 2022 at 6:16 pm

I am out of the storage world professionally, but I do dabble in it for my homelabs. Do you have recommendation (he and/or SW) for any openzfs setup suitable for home use? I am currently running a Synology box (btrfs) but feel itchy to build and migrate to a FreeNAS setup.

Perhaps it can be your next post.

- cfheoh says:
  
  June 1, 2022 at 6:14 am
  
  Alex Tien. How are you doing?
  
  If you want, I can send you a link to my eBook on FreeNAS 11.3 (old but still gold). You still got my contact number? Email to chinfah@katanalogic.com