At AWS re:Invent last week, Amazon Web Services announced Amazon FSx for OpenZFS. This is the 4th managed service under the Amazon FSx umbrella, joining NetApp® ONTAP™, Lustre and Windows File Server. The highly scalable OpenZFS filesystem can provide high throughput and IOPS bandwidth to Amazon EC2, ECS, EKS and VMware® Cloud on AWS.
I am assuming the AWS OpenZFS uses EBS as the block storage backend, given the announcement that it can deliver 4GB/sec of throughput and 160,000 IOPS from the “drives” without caching. How the OpenZFS is provisioned to the AWS clients is well documented in this blog here. It is an absolutely joy (for me) to see the open source OpenZFS filesystem getting the validation and recognization from AWS. This is one hell of a filesystem.
But this blog isn’t about AWS FSx for OpenZFS with block storage. It is about what is coming, and eventually AWS FSx for OpenZFS could expand into AWS’s proficient S3 storage as well. Can OpenZFS integrate with an S3 object storage backend? This blog looks into the burning question.
Making it happen
The ZFS filesystem is known to work with block devices. Exposing object storage as blocks to ZFS require a native connector into object storage via S3, the de facto communication protocol. The ZFS architecture is reworked to include the integration with object storage, and shared below:
I have highlighted 3 green boxes in the architecture diagram above. These are the 3 new foundational ZFS components, named Object Store, Zetta Object and Zetta Cache.
There is a new vdev (virtual device) structure in OpenZFS called Object Store. Instead of the usual hard disk drives or SSDs or block devices, the Object Store vdev links the block I/O calls, reads and writes to the ZFS I/O scheduler (ZIO) to the object storage backend.
Between the Object Store and the object storage back end is the Zetta Object which converts the block I/O requests into object storage requests. The Zetta Object is critical because it is the “translator” or the bridge between ZFS filesystem and the object storage. It handles the pairing of both worlds, and the mapping of Block IDs to Object IDs. Note that the mapping of blocks to objects are not 1-1 pairing. Several blocks may be coalesced into one object. This is to improve performance, especially when dealing with small block sized (< 16KB) transactional workloads.
Zetta Object also manages the S3 communication.
Lastly, Zetta Cache is a quasi, Unix-like filesystem implementation that should run on persistent storage media. It has several functions in order to improve the responsiveness to the Object Store. It stores the indexed mapping of the Block IDs-to-Object IDs (though not the entire map), metadata related to the blocks and objects allocated as well as frequently requested object data and written data blocks. The cache is populated and unpopulated over an LRU (least recently used) algorithm to keep things efficiently.
ZFS on object storage presentations
I want to say that I am not doing enough justice to what I have written here. What I have shared so far are just the tip of the iceberg. There are plenty more, and these are shared in the video from the OpenZFS Developer Summit 2021 below:
The Zetta Cache video presentation is below:
There are still a lot of thoughts, considerations and improvements to bring OpenZFS on object storage to an optimal stage. It will not be a flip of a switch, but the ball has started rolling. The momentum can only get bigger from here on, and the outcome, better than before!
Many stateful applications such as databases demand ACID of atomicity, consistency, isolation and durability. These applications work comfortably and confidently with POSIX-compliant filesystems and NAS and SAN storage services. This new addition of integration with object storage provides the best of both world where stateful applications do not leave the comfort settings, and still be able to take advantage the benefits of object storage.
The possibilities, when this OpenZFS with object storage is in the next OpenZFS version release, is very exciting. Think about having a highly transactional server housed on-premises using all kinds of storage services – SAN, NAS, S3 – everywhere, with OpenZFS as the sole aggregator and champion of its data. As this implementation matures becomes an OpenZFS feature, it would be able to work with AWS’s S3 storage and other S3 compatible cloud storage.
Trust OpenZFS to make good of object storage.