Deploying a MinIO SNMD Object Storage Server in TrueNAS SCALE

[ Preamble ] This deployment of MinIO SNMD (single node multi drive) object storage server on TrueNAS® SCALE 24.04 (codename “Dragonfish”) is experimental. I am just deploying this in my home lab for the fun of it. Do not deploy in any production environment.

I have been contemplating this for quite a while. Which MinIO deployment mode on TrueNAS® SCALE should I work on? For one, there are 3 modes – Standalone, SNMD (Single Node Multi Drives) and MNMD (Multi Node Multi Drives). Of course, the ideal lab experiment is MNMD deployment, the MinIO cluster, and I am still experimenting this on my meagre lab resources.

In the end, I decided to implement SNMD since this is, most likely, deployed on top of a TrueNAS® SCALE storage appliance instead an x-86 bare-metal or in a Kubernetes cluster on Linux systems. Incidentally, the concept of MNMD on top of TrueNAS® SCALE is “Kubernetes cluster”-like albeit a different container platform. At the same time, if this is deployed in a TrueNAS® SCALE Enterprise, a dual-controller TrueNAS® storage appliance, it will take care of the “MinIO nodes” availability in its active-passive HA architecture of the appliance. Otherwise, it can be a full MinIO cluster spread and distributed across several TrueNAS storage appliances (minimum 4 nodes in a 2+2 erasure set) in an MNMD deployment scheme.

Ideally, the MNMD deployment should look like this:

MinIO distributed multi-node cluster architecture (credit: MinIO)

Continue reading

Enhancing NAS client resiliency and performance with SMB Multichannel and NFS nconnect

NAS (network attached storage) is obviously the file-level workhorse for shared resources in the network of any organization. SMB (server message block) for Windows environments and NFS (network file system) for Linux platforms are the 2 most prominent protocols that rule the NAS world. Of course we have SMB implementations in the form of Samba and others in non-Windows, Linux and NFS implementations in Windows as well.

As the versions of both network file sharing protocols iterated, present versions of SMB v3.x and NFS v4.x (NFS v3 on the supported Linux kernel version) on the client-side have evolved well. Both now have enhanced resiliency and performance improvement features. And there is an underlying similarity of both implementations. This blog looks at the client-side architectures of both.

One TCP connection

NAS is a client-server architecture. Over the network, NAS clients (SMB or NFS) access their corresponding NAS server(s) – SMB or NFS server(s) – through the TCP/IP network.

NAS client-server architecture (Credit: https://hypertecsp.com/en-CA/knowledge-base/nas-vs-san/)

One very important key starting point to note is the use of one TCP connection per NAS client to the NAS server relationship. For both SMB and NFS, there is just one TCP link between client and the server even if there are several SMB mapped shares or NFS mount points respectively on the clients.

For a long time, this one TCP connection is sufficient for the NAS traffic. But as the network file accesses grow, this connection becomes both a single point of failure as well as a performance bottleneck.

Continue reading

FDT – Deduplication Reimagined in OpenZFS

Deduplication in OpenZFS has been a bugbear for some years now. As data sets get larger, they have become even more difficult in using the present DeDuplication Table (DDT) method. Deduplication in OpenZFS is often derided as overwhelming and sluggish in performance.

Moreover, there is a common folklore passed on and on about allocating 5GB of RAM for every 1TB to dedupe in OpenZFS. I don’t know where this “sizing” came about. Probably derived from something Jeff Bonwick wrote back in the early days of ZFS. But there is some truth to this “rule of thumb”, commonly passed around in the TrueNAS® circles.

Nevertheless, given the exponential growth of data, and the advancement of processing power in modern day computer systems, the OpenZFS development community has decided to revamp the DDT method. Several prominent luminaries from iXsystems™, Klara Systems and the OpenZFS community have got together in mid-2023 to develop FDT or Fast Dedupe Table. And we got to see FDT announced to the world in the most recent OpenZFS Developer Summit in November 2023.

Fast Dedupe Table (FDT)

Fast Dedupe Table (FDT) is a log-based dedupe. In OpenZFS, all the write block I/Os that come into OpenZFS are coalesced into transaction groups (TXGs), hashed and checksummed, before they are committed to persistent media.

The new implementation in FDT is to put these incoming TXGs checksums and hashes into an append-only log structure in persistent storage, and also tracking the hashed changes in an AVL-tree residing in the memory. An AVL tree is a self-balancing binary search tree structure that is very efficient in searching, thus giving FDT the speed in initiating the deduplication lookups and updates.

OpenZFS Fast Dedupe Table (FDT) in a nutshell

The append-only log structure works hand-in-hand with the AVL tree to accept and stage (including intelligent sorting) the hash entries that are coming in after the TXGs writes. Then at a certain marker, that could be at a particular time-based trigger or a high-water mark, then the entries in the logs and AVL tree are flushed to the ZAP (ZFS Attribute Processor) where the actual full map of the OpenZFS blocks reside.

Continue reading

Proxmox storage with TrueNAS iSCSI volumes

A few weeks ago, I decided to wipe clean my entire lab setup running Proxmox 6.2. I wanted to connect the latest version of Proxmox VE 8.0-2 using iSCSI LUNs from the TrueNAS® system I have with me. I thought it would be fun to have the configuration steps and the process documented. This is my journal on how to provision a TrueNAS® CORE iSCSI LUN to Proxmox storage. This iSCSI volume in Proxmox is where the VMs will be installed into.

Here is a simplified network diagram of my setup but it will be expanded to a Proxmox cluster in the future with the shared storage.

Proxmox and TrueNAS network setup

Preparing the iSCSI LUN provisioning

The iSCSI LUN (logical unit number) is provisioned as a logical disk volume to the Proxmox node, where the initiator-target relationship and connection are established.

This part assumes that a zvol has been created from the zpool. At the same time, the IQN (iSCSI Qualified Name) should be known to the TrueNAS® storage as it establishes the connection between Proxmox (iSCSI initiator) and TrueNAS (iSCSI target).

The IQN for Proxmox can be found by viewing the content of the /etc/iscsi/initiatorname.iscsi within the Proxmox shell as shown in the screenshot below.

Where to find the Proxmox iSCSI IQN

The green box shows the IQN number of the Proxmox node that starts with iqn.year-month.com.domain:generated-hostname. This will be used during the iSCSI target portal configuration in the TrueNAS® webGUI.

Continue reading

OpenZFS dRAID has risen!

We await the 3rd iteration of TrueNAS® SCALE 23.10 codenamed Cobia. 23.10 means October 2023, and we are within weeks of its announcement.

One of the best features I have been waiting for is dRAID or distributed RAID. I have written about it dRAID a couple of years back. It was announced in 2021, in OpenZFS 2.1, but we have not seen an commercial implementation of dRAID … until iXsystems™ TrueNAS® SCALE 23.10. Why am I so excited?

I have followed the technology since Isaac Huang presented dRAID at the OpenZFS Summit in 2015. Through the years ahead, I have seen Isaac presenting dRAID at the summits, and with each iteration, dRAID got closer and closer to be developed into OpenZFS. It was not until 2021, in OpenZFS 2.1 when dRAID became part of filesystem. And now, dRAID is finally in the TrueNAS® SCALE offering.

Knowing RAID resilvering

RAID rebuilding or reconstruction is a painful and potentially risky process. In OpenZFS and ZFS speak, this process is called resilvering. In simple laymen terms, when a drive (or drives) failed in a parity-based RAID volume (eg. RAID-Z1 or RAID-Z2 vdev), the data which was previously in the failed drive is recreated in the newly integrated spare drive. The structural integrity of the RAID volume (and the storage pool) is preserved but the data that was lost is painstakingly remade through the mathematical algorithm of the parity function of the RAID volume.

When hard disk drives were small in capacity like 2TB or less, the RAID resilvering process was probably faster to complete, returning the parity RAID volume to a normal, online state. But today, drives are 22TB and higher, leaving the traditional RAID resilvering process to take days and even weeks. This leads the RAID volume vulnerable to another possible drive failure, weakening the integrity of the RAID volume. Even worse, most of modern day storage arrays have many disk drives, into the thousands even. And yes, solid state drives would probably be faster in the resilvering, but the same mechanics pretty much apply in OpenZFS.

At the same time, the spare drives are assigned physically and designated to the OpenZFS storage pool, and are not part of the vdev until the resilvering process kicks in.

Yes, this is pretty much a physical process that takes time, computing resources and patience. Note the operative word of “physical” here.

dRAID resilvering

dRAID speeds up the RAID resilvering process several folds, returning the RAID volume (or vdev) much faster than traditional OpenZFS RAID resilvering process. It uses a logical (as opposed to physical) RAID layout concept and uses “logical spare drives”. Thus, there will be many spares “blocks” distributed across the entire dRAID zpool, as shown in the diagram below.

Traditional RAID vdev vs dRAID vdev

Continue reading

Open Source Storage and Data Responsibility

There was a Super Blue Moon a few days ago. It was a rare sky show. Friends of mine who are photo and moon gazing enthusiasts were showing off their digital captures online. One ignorant friend, who was probably a bit envious of the other people’s attention, quipped that his Oppo Reno 10 Pro Plus can take better pictures. Oppo Reno 10 Pro Plus claims 3x optical zoom and 120x digital zoom. Yes, 120 times!

Yesterday, a WIRED article came out titled “How Much Detail of the Moon Can Your Smartphone Really Capture?” It was a very technical article. I thought the author did an excellent job explaining the physics behind his notes. But I also found the article funny, flippant even, when I juxtaposed this WIRED article to what my envious friend was saying the other day about his phone’s camera.

Super Blue Moon 2023

Open Source storage expectations and outcomes

I work for iXsystems™. Open Source has been its DNA for over 30 years. Similarly, I have also worked on Open Source (decades before it was called open source) in my home labs ever since I entered the industry. I had SoftLanding Linux System 3.5″ diskette (Linux kernel 0.99), and I bought a boxed set of FreeBSD OS from Walnut Creek (photo below). My motivation was to learn as much as possible about information technology world because I was making my first steps into building my career (I was also quietly trying to prove my father wrong) in the IT industry.

FreeBSD Boxed Set (circa 1993)

 

Open source has democratized technology. It has placed the power of very innovative technology into the hands of the common people With Open Source, I see the IT landscape changing as well, especially for home labers like myself in the early years. Social media platforms, FAANG (Facebook, Apple, Amazon, Netflix, Google), etc, etc, have amplified that power (to the people). But with that great power, comes great responsibility. And some users with little technology background start to have hallucinated expectations and outcomes. Just like my friend with the “powerful” Oppo phone.

Likewise, in my world, I have plenty of anecdotes of these types of open source storage users having wild expectations, but little skills to exact the reality.

Continue reading

Open Source on my mind

Last week was cropped with topics around Open Source software. I want to voice my opinions here (with a bit of ranting) and hoping not to rouse many abhorrent comments from different parties and views. This blog is to create conversations, even controversial ones, but we must first agree that there will be disagreements. We must accept disagreements as part of this conversation.

In my 30 years career, Open Source has been a big part of my development and progress. The ideas of freely using (certain) software without any licensing implications and these software being openly available were not always welcomed, as they are now. I think the Open Source revolution has created an innovation movement that is still going strong, and it has not only permeated completely into the IT industry, Open Source has also now in almost every part of the technology-based industries as well. The Open Source influence is massive.

Open Source word cloud

In the beginning

In the beginning, in my beginning in 1992, the availability of software and its source codes was a closed one. Coming from a VAX/VMS background (I was a system admin in my mathematics department’s mini computers), Unix liberated my thinking. The final 6 months in the university was systems programming in C, and it completely changed how I wanted my career to shape. The mantra of “Free as in Freedom” in General Public License GPL (which I got know of much later) boded well with my own tenets in life.

If closed source development models led to proprietary software and a centralized way to distributing software with license, I would count the Open Source development models as one of the earliest decentralized technology frameworks. Down with the capitalistic corporations (aka Evil Empires)!

It was certainly a wonderful and generous way to make the world that it is today. It is a better world now.

Continue reading

Beyond the WORM with MinIO object storage

I find the terminology of WORM (Write Once Read Many) coming back into the IT speak in recent years. In the era of rip and burn, WORM was a natural thing where many of us “youngsters” used to copy files to a blank CD or DVD. I got know about how WORM worked when I learned that the laser in the CD burning process alters the chemical compound in a segment on the plastic disc of the CD, rendering the “burned” segment unwritable once it was written but it could be read many times.

At the enterprise level, I got to know about WORM while working with tape drives and tape libraries in the mid-90s. The objective of WORM is to save and archive the data and files in a non-rewritable format for compliance reasons. And it was the data compliance and data protection parts that got me interested into data management. WORM is a big deal in many heavily regulated industries such as finance and banking, insurance, oil and gas, transportation and more.

Obviously things have changed. WORM, while very much alive in the ageless tape industry, has another up-and-coming medium in Object Storage. The new generation of data infrastructure and data management specialists are starting to take notice.

Worm Storage – Image from Hubstor (https://www.hubstor.net/blog/write-read-many-worm-compliant-storage/)

I take this opportunity to take MinIO object storage for a spin in creating WORM buckets which can be easily architected as data compliance repositories with many applications across regulated industries. Here are some relevant steps.

Continue reading

Stating the case for a Storage Appliance approach

I was in Indonesia last week to meet with iXsystems™‘ partner PT Maha Data Solusi. I had the wonderful opportunity to meet with many people there and one interesting and often-replayed question arose. Why aren’t iX doing software-defined-storage (SDS)? It was a very obvious and deliberate question.

After all, iX is already providing the free use of the open source TrueNAS® CORE software that runs on many x86 systems as an SDS solution and yet commercially, iX sell the TrueNAS® storage appliances.

This argument between a storage appliance model and a storage storage only model has been debated for more than a decade, and it does come into my conversations on and off. I finally want to address this here, with my own views and opinions. And I want to inform that I am open to both models, because as a storage consultant, both have their pros and cons, advantages and disadvantages. Up front I gravitate to the storage appliance model, and here’s why.

My story of the storage appliance begins …

Back in the 90s, most of my work was on Fibre Channel and NFS. iSCSI has not existed yet (iSCSI was ratified in 2003). It was almost exclusively on the Sun Microsystems® enterprise storage with Sun’s software resell of the Veritas® software suite that included the Sun Volume Manager (VxVM), Veritas® Filesystem (VxFS), Veritas® Replication (VxVR) and Veritas® Cluster Server (VCS). I didn’t do much Veritas® NetBackup (NBU) although I was trained at Veritas® in Boston in July 1997 (I remembered that 2 weeks’ trip fondly). It was just over 2 months after Veritas® acquired OpenVision. Backup Plus was the NetBackup.

Between 1998-1999, I spent a lot of time working Sun NFS servers. The prevalent networking speed at that time was 100Mbits/sec. And I remember having this argument with a Sun partner engineer by the name of Wong Teck Seng. Teck Seng was an inquisitive fella (still is) and he was raving about this purpose-built NFS server he knew about and he shared his experience with me. I detracted him, brushing aside his always-on tech orgasm, and did not find great things about a NAS storage appliance. Auspex™ was big then, and I knew of them.

I joined NetApp® as Malaysia’s employee #2. It was an odd few months working with a storage appliance but after a couple of months, I started to understand and appreciate the philosophy. The storage Appliance Model made sense to me, even through these days.

Continue reading