Delphix Archives - Storage Gaga

FDT – Deduplication Reimagined in OpenZFS

By cfheoh | April 18, 2024 - 1:17 pm |April 18, 2024 compression, Deduplication, deduplication, Delphix, Filesystems, FreeNAS, iXsystems, Klara Systems, OpenZFS, Storage Optimization, TrueNAS

1 Comment

Deduplication in OpenZFS has been a bugbear for some years now. As data sets get larger, they have become even more difficult in using the present DeDuplication Table (DDT) method. Deduplication in OpenZFS is often derided as overwhelming and sluggish in performance.

Moreover, there is a common folklore passed on and on about allocating 5GB of RAM for every 1TB to dedupe in OpenZFS. I don’t know where this “sizing” came about. Probably derived from something Jeff Bonwick wrote back in the early days of ZFS. But there is some truth to this “rule of thumb”, commonly passed around in the TrueNAS® circles.

Nevertheless, given the exponential growth of data, and the advancement of processing power in modern day computer systems, the OpenZFS development community has decided to revamp the DDT method. Several prominent luminaries from iXsystems™, Klara Systems and the OpenZFS community have got together in mid-2023 to develop FDT or Fast Dedupe Table. And we got to see FDT announced to the world in the most recent OpenZFS Developer Summit in November 2023.

Fast Dedupe Table (FDT)

Fast Dedupe Table (FDT) is a log-based dedupe. In OpenZFS, all the write block I/Os that come into OpenZFS are coalesced into transaction groups (TXGs), hashed and checksummed, before they are committed to persistent media.

The new implementation in FDT is to put these incoming TXGs checksums and hashes into an append-only log structure in persistent storage, and also tracking the hashed changes in an AVL-tree residing in the memory. An AVL tree is a self-balancing binary search tree structure that is very efficient in searching, thus giving FDT the speed in initiating the deduplication lookups and updates.

OpenZFS Fast Dedupe Table (FDT) in a nutshell

The append-only log structure works hand-in-hand with the AVL tree to accept and stage (including intelligent sorting) the hash entries that are coming in after the TXGs writes. Then at a certain marker, that could be at a particular time-based trigger or a high-water mark, then the entries in the logs and AVL tree are flushed to the ZAP (ZFS Attribute Processor) where the actual full map of the OpenZFS blocks reside.

Continue reading →

OpenZFS dRAID has risen!

By cfheoh | October 14, 2023 - 2:25 pm |October 14, 2023 Appliance, Clusters, Data Availability, Delphix, Disks, Filesystems, FreeNAS, High Performance Computing, Hyperconvergence, Intel, iXsystems, Linux, OpenZFS, RAID, Software Defined Storage, TrueNAS

Knowing RAID resilvering

RAID rebuilding or reconstruction is a painful and potentially risky process. In OpenZFS and ZFS speak, this process is called resilvering. In simple laymen terms, when a drive (or drives) failed in a parity-based RAID volume (eg. RAID-Z1 or RAID-Z2 vdev), the data which was previously in the failed drive is recreated in the newly integrated spare drive. The structural integrity of the RAID volume (and the storage pool) is preserved but the data that was lost is painstakingly remade through the mathematical algorithm of the parity function of the RAID volume.

When hard disk drives were small in capacity like 2TB or less, the RAID resilvering process was probably faster to complete, returning the parity RAID volume to a normal, online state. But today, drives are 22TB and higher, leaving the traditional RAID resilvering process to take days and even weeks. This leads the RAID volume vulnerable to another possible drive failure, weakening the integrity of the RAID volume. Even worse, most of modern day storage arrays have many disk drives, into the thousands even. And yes, solid state drives would probably be faster in the resilvering, but the same mechanics pretty much apply in OpenZFS.

At the same time, the spare drives are assigned physically and designated to the OpenZFS storage pool, and are not part of the vdev until the resilvering process kicks in.

Yes, this is pretty much a physical process that takes time, computing resources and patience. Note the operative word of “physical” here.

dRAID resilvering

dRAID speeds up the RAID resilvering process several folds, returning the RAID volume (or vdev) much faster than traditional OpenZFS RAID resilvering process. It uses a logical (as opposed to physical) RAID layout concept and uses “logical spare drives”. Thus, there will be many spares “blocks” distributed across the entire dRAID zpool, as shown in the diagram below.

Traditional RAID vdev vs dRAID vdev

Continue reading →

OpenZFS with Object Storage

By cfheoh | December 6, 2021 - 8:00 am |December 6, 2021 Amazon Web Services, API, Cloud, Delphix, Filesystems, Minio, NAS, Object Storage, OpenZFS, Software Defined Storage, Storage Area Network, Unified Storage

2 Comments

At AWS re:Invent last week, Amazon Web Services announced Amazon FSx for OpenZFS. This is the 4th managed service under the Amazon FSx umbrella, joining NetApp® ONTAP™, Lustre and Windows File Server. The highly scalable OpenZFS filesystem can provide high throughput and IOPS bandwidth to Amazon EC2, ECS, EKS and VMware® Cloud on AWS.

I am assuming the AWS OpenZFS uses EBS as the block storage backend, given the announcement that it can deliver 4GB/sec of throughput and 160,000 IOPS from the “drives” without caching. How the OpenZFS is provisioned to the AWS clients is well documented in this blog here. It is an absolutely joy (for me) to see the open source OpenZFS filesystem getting the validation and recognization from AWS. This is one hell of a filesystem.

But this blog isn’t about AWS FSx for OpenZFS with block storage. It is about what is coming, and eventually AWS FSx for OpenZFS could expand into AWS’s proficient S3 storage as well. Can OpenZFS integrate with an S3 object storage backend? This blog looks into the burning question.

In the recently concluded OpenZFS Developer Summit 2021, one of the topics was “ZFS on Object Storage“, and the short answer is a resounding YES!

OpenZFS Developer Summit 2021

Continue reading →

Storage Elephant Compute Birds

By cfheoh | November 22, 2021 - 8:00 am |November 21, 2021 Actifio, Amazon Web Services, Analytics, API, Artificial Intelligence, Backup, Big Data, Business Continuity, Cloud, Composable Infrastructure, Data, Data Archiving, Data Availability, Data Management, Data Privacy, Data Protection, Data Security, Deduplication, DellEMC, Delphix, Digital Transformation, Disaster Recovery, Edge Computing, Hadoop, Hyperconvergence, IoT, iXsystems, Kubernetes, Machine Learning, NetApp, Nimble Storage, Nutanix, Object Storage, Simplivity, Snapshots, Storage Optimization, Storage Tiering, TrueNAS, Virtualization, VMware

Take stock of your data movement

I had recent conversations with an end user who has been paying a lot of dollars keeping their “backup” and “archive” in AWS Glacier. The S3 storage is cheap enough to hold several petabytes of data for years, because the IT folks said that the data in AWS Glacier are for “backup” and “archive”. I put both words in quotes because they were termed as “backup” and “archive” because of their enterprise practice. However, the face of their business is changing. They are in manufacturing, oil and gas downstream, and the definitions of “backup” and “archive” data has changed.

For one, there is a strong demand for reusing the past data for various reasons and these datasets have to be recalled from their cloud storage. Secondly, their data movement activities still mimicked what they did in the past during their enterprise storage days. It was a classic lift-and-shift when they moved to the cloud, and not taking stock of their data movements and the operations they ran on these datasets. Still ongoing, their monthly AWS cost a bomb.

Continue reading →

Open Source Storage Technology Crafters

By cfheoh | October 4, 2021 - 8:00 am |October 4, 2021 Appliance, Ceph, Cloud, Data Fabric, Data Management, Delphix, Filesystems, FreeNAS, Hadoop, Hadoop Clusters, iXsystems, Linux, Lustre, Minio, NAS, Object Storage, Openstack, OpenZFS, Oracle, Redhat, SMB, Snapshots, SNIA, SoftIron, Software Defined Storage, TrueNAS, Unified Storage, Virtualbox, Virtualization

Storage Assemblers

Anybody can “build” a storage system with open source storage software. Put the software together with any commodity x86 server, and it can function with the basic storage services. Most open source storage software can do the job pretty well. However, once the completed storage technology is put together, can it do the job well enough to serve a business critical end user? I have plenty of sob stories from end users I have spoken to in these many years in the industry related to so-called “enterprise” storage vendors. I wrote a few blogs in the past that related to these sad situations:

We have such storage offerings rigged with cybersecurity risks and holes too. In a recent Unit 42 report, 250,000 NAS devices are vulnerable and exposed to the public Internet. The brands in question are mentioned in the report.

I would categorize these as storage assemblers.

Continue reading →

A FreeNAS Compression Tale

By cfheoh | March 22, 2021 - 9:00 am |March 22, 2021 Algorithm, compression, deduplication, Deduplication, Delphix, Filesystems, FreeNAS, iXsystems, NetApp, OpenZFS, Storage Optimization, TrueNAS

1 Comment

David vs Goliath Credit: Miguel Robledo of https://www.artstation.com/miguel_robledo

David vs Goliath

It was an underdog tale worthy of the biblical book of Samuel. When I first caught wind of how FreeNAS™ compression prowess was going against NetApp® compression and deduplication in one use case, I had to find out more. And the results in this use case was quite impressive considering that FreeNAS™ (now known as TrueNAS® CORE) is the free, open source storage operating system and NetApp® Data ONTAP, is the industry leading, enterprise, “king of the hill” storage data management software.

Certainly a David vs Goliath story.

Compression in FreeNAS

Ah, Compression! That technology that is often hidden, hardly seen and often forgotten.

Compression is a feature within FreeNAS™ that seldom gets the attention. It works, and certainly is a mature form of data footprint reduction (DFR) technology, along with data deduplication. It is switched on by default, and is the setting when creating a dataset, as shown below:

Dataset creation with Compression (lz4) turned on

The default compression algorithm is lz4 which is fast but poor in compression ratio compared to gzip and bzip2. However, lz4 uses less CPU cycles to perform its compression and decompression processing, and thus the impact on FreeNAS™ and TrueNAS® is very low.

Quick Benchmark: Gzip vs Bzip2 vs LZMA vs XZ vs LZ4 vs LZO

NetApp® ONTAP, if I am not wrong, uses lzopro as default – a commercial and optimized version of the open source LZO compression library. In addition, NetApp also has their data deduplication technology as well, something OpenZFS has to improve upon in the future.

The DFR report

This brings us to the use case at one of iXsystems™ customers in Taiwan. The data to be reduced are mostly log files at the end user, and the version of FreeNAS™ is 11.2u7. There are, of course, many factors that affect the data reduction ratio, but in this case of 4 scenarios, the end user has been running this in production for over 2 months. The results:

FreeNAS vs NetApp Data Footprint Reduction

In 2 of the 4 scenarios, FreeNAS™ performed admirably with just the default lz4 compression alone, compared to NetApp® which was running both their inline compression and deduplication.

The intention to post this report is not to show that FreeNAS™ is better in every case. It won’t be, and there are superior data footprint reduction tech out there which can outperform it. But I would expect potential and existing end users to leverage on the compression capability of FreeNAS™ which is getting better all the time.

A better compression algorithm

Followers of OpenZFS are aware of the changing of times with OpenZFS version 2.0. One exciting update is the introduction of the zstd compression algorithm into OpenZFS late last year, and is already in TrueNAS® CORE and Enterprise version 12.x.

What is zstd? zstd is a fast compression algorithm that aims to be as efficient (or better) than gzip, but with better speed closer to lz4, relatively. For a long time, the gzip compression algorithm, from levels 1-9, has been serving very good compression ratio compared to many compression algorithms, lz4 included.

However, the efficiency came at a higher processing price and thus took a longer time. At the other end, lz4 is fast and lightweight, but its reduction ratio efficiency is very poor. zstd intends to be the in-between of gzip and lz4. In the latest results published by Facebook’s github page,

zstd performance benchmark against other compression algorithms

For comparison, zstd (level -1) performed very well against zlib, the data compression library in gzip. It was made known there are 22 levels of compression in zstd but I do not know how many levels are accepted in the OpenZFS development.

At the same time, compression takes advantage of multi-core processing, and actually can speed up disk I/O response because the original dataset to be processed is smaller after the compression reduction.

While TrueNAS® still defaults lz4 compression as of now, you can probably change the default compression with a command

# zfs set compression=zstd-6 pool/dataset

Your choice

TrueNAS® and FreeNAS™ support multiple compression algorithms. lz4, gzip and now zstd. That gives the administrator a choice to assign the right compression algorithm based on processing power, storage savings, and time to get the best out of the data stored in the datasets.

As far as the David vs Goliath tale goes, this real life use case was indeed a good one to share.

Discovering OpenZFS Fusion Pool

By cfheoh | January 25, 2021 - 9:00 am |January 21, 2021 Appliance, Data Availability, Deduplication, deduplication, Deep Learning, Delphix, Disks, Filesystems, FreeNAS, iXsystems, Linux, NetApp, OpenZFS, RAID, Solid State Devices, TrueNAS

1 Comment

Fusion Pool excites me, but unfortunately this new key feature of OpenZFS is hardly talked about. I would like to introduce the Fusion Pool feature as iXsystems™ expands the TrueNAS® Enterprise storage conversations.

I would not say that this technology is revolutionary. Other vendors already have the similar concept of Fusion Pool. The most notable (to me) is NetApp® Flash Pool, and I am sure other enterprise storage vendors have the same. But this is a big deal (for me) for an open source file system in OpenZFS.

What is Fusion Pool (aka ZFS Allocation Classes)?

To understand Fusion Pool, we have to understand the basics of the ZFS zpool. A zpool is the aggregation (borrowing the NetApp® terminology) of vdevs (virtual devices), and vdevs are a collection of physical drives configured with the OpenZFS RAID levels (RAID-0, RAID-1, RAID-Z1, RAID-Z2, RAID-Z3 and a few nested RAID permutations). A zpool can start with one vdev, and new vdevs can be added on-the-fly, expanding the capacity of the zpool online.

There are several types of vdevs prior to Fusion Pool, and this is as of pre-TrueNAS® version 12.0. As shown below, these are the types of vdevs available to the zpool at present.

OpenZFS zpool and vdev types – Credit: Jim Salter and Arstechnica

Fusion Pool is a zpool that integrates with a new, special type of vdev, alongside other normal vdevs. This special vdev is designed to work with small data blocks between 4-16K, and is highly efficient in handling random reading and writing of these small blocks. This bodes well with the OpenZFS file system metadata blocks and other blocks of small files. And the random nature of the Read/Write I/Os works best with SSDs (can be read or write intensive SSDs).

Continue reading →

OpenZFS 2.0 exciting new future

By cfheoh | October 19, 2020 - 7:38 pm |October 19, 2020 Clusters, Datto, Deduplication, deduplication, Delphix, Disks, Filesystems, Flash, FreeNAS, High Performance Computing, IBM, Intel, iXsystems, Joyent, Linux, Lustre, NAS, Nexenta, Oracle, Panasas, Panzura, Performance Caching, RAID, Reliability, Snapshots, Software Defined Storage, Solid State Devices

2 Comments

The OpenZFS (virtual) Developer Summit ended over a weekend ago. I stayed up a bit (not much) to listen to some of the talks because it started midnight my time, and ran till 5am on the first day, and 2am on the second day. Like a giddy schoolboy, I was excited, not because I am working for iXsystems™ now, but I have been a fan and a follower of the ZFS file system for a long time.

History wise, ZFS was conceived at Sun Microsystems in 2005. I started working on ZFS reselling Nexenta in 2009 (my first venture into business with my company nextIQ) after I was professionally released by EMC early that year. I bought a Sun X4150 from one of Sun’s distributors, and started creating a lab server. I didn’t like the workings of NexentaStor (and NexentaCore) very much, and it was priced at 8TB per increment. Later, I started my second company with a partner and it was him who showed me the elegance and beauty of ZFS through the command lines. The creed of ZFS as a volume and a file system at the same time with the CLI had an effect on me. I was in love.

OpenZFS Developer Summit 2020 Logo

Exciting developments

Among the many talks shared in the OpenZFS Developer Summit 2020 , there were a few ideas and developments which were exciting to me. Here are 3 which I liked and I provide some commentary about them.

Block Reference Table
dRAID (declustered RAID)
Persistent L2ARC

Continue reading →

Category Archives: Delphix