OpenZFS dRAID has risen!

We await the 3rd iteration of TrueNAS® SCALE 23.10 codenamed Cobia. 23.10 means October 2023, and we are within weeks of its announcement.

One of the best features I have been waiting for is dRAID or distributed RAID. I have written about it dRAID a couple of years back. It was announced in 2021, in OpenZFS 2.1, but we have not seen an commercial implementation of dRAID … until iXsystems™ TrueNAS® SCALE 23.10. Why am I so excited?

I have followed the technology since Isaac Huang presented dRAID at the OpenZFS Summit in 2015. Through the years ahead, I have seen Isaac presenting dRAID at the summits, and with each iteration, dRAID got closer and closer to be developed into OpenZFS. It was not until 2021, in OpenZFS 2.1 when dRAID became part of filesystem. And now, dRAID is finally in the TrueNAS® SCALE offering.

Knowing RAID resilvering

RAID rebuilding or reconstruction is a painful and potentially risky process. In OpenZFS and ZFS speak, this process is called resilvering. In simple laymen terms, when a drive (or drives) failed in a parity-based RAID volume (eg. RAID-Z1 or RAID-Z2 vdev), the data which was previously in the failed drive is recreated in the newly integrated spare drive. The structural integrity of the RAID volume (and the storage pool) is preserved but the data that was lost is painstakingly remade through the mathematical algorithm of the parity function of the RAID volume.

When hard disk drives were small in capacity like 2TB or less, the RAID resilvering process was probably faster to complete, returning the parity RAID volume to a normal, online state. But today, drives are 22TB and higher, leaving the traditional RAID resilvering process to take days and even weeks. This leads the RAID volume vulnerable to another possible drive failure, weakening the integrity of the RAID volume. Even worse, most of modern day storage arrays have many disk drives, into the thousands even. And yes, solid state drives would probably be faster in the resilvering, but the same mechanics pretty much apply in OpenZFS.

At the same time, the spare drives are assigned physically and designated to the OpenZFS storage pool, and are not part of the vdev until the resilvering process kicks in.

Yes, this is pretty much a physical process that takes time, computing resources and patience. Note the operative word of “physical” here.

dRAID resilvering

dRAID speeds up the RAID resilvering process several folds, returning the RAID volume (or vdev) much faster than traditional OpenZFS RAID resilvering process. It uses a logical (as opposed to physical) RAID layout concept and uses “logical spare drives”. Thus, there will be many spares “blocks” distributed across the entire dRAID zpool, as shown in the diagram below.

Traditional RAID vdev vs dRAID vdev

Continue reading

Disaggregation and Composability vital for AI/DL models to scale

New generations of applications and workloads like AI/DL (Artificial Intelligence/Deep Learning), and HPC (High Performance Computing) are breaking the seams of entrenched storage infrastructure models and frameworks. We cannot continue to scale-up or scale-out the storage infrastructure to meet these inundating fluctuating I/O demands. It is time to look at another storage architecture type of infrastructure technology – Composable Infrastructure Architecture.

Infrastructure is changing. The previous staid infrastructure architecture parts of compute, network and storage have long been thrown of the window, precipitated by the rise of x86 server virtualization almost 20 years now. It triggered a tsunami of virtualizing everything, including storage virtualization, which eventually found a more current nomenclature – Software Defined Storage. Both storage virtualization and software defined storage (SDS) are similar and yet different and should be revered through different contexts and similar goals. This Tech Target article laid out both nicely.

As virtualization raged on, converged infrastructure (CI) which evolved into hyperconverged infrastructure (HCI) went fever pitch for a while. Companies like Maxta, Pivot3, Atlantis, are pretty much gone, with HPE® Simplivity and Cisco® Hyperflex occasionally blipped in my radar. In a market that matured very fast, HCI is now dominated by Nutanix™ and VMware®, with smaller Microsoft®, Dell EMC® following them.

From HCI, the attention of virtualization has shifted something more granular, more scalable in containerization. Despite a degree of complexity, containerization is taking agility and scalability to the next level. Kubernetes, Dockers are now mainstay nomenclature of infrastructure engineers and DevOps. So what is driving composable infrastructure? Have we reached the end of virtualization? Not really.

Evolution of infrastructure. Source: IDC

It is just that one part of the infrastructure landscape is changing. This new generation of AI/ML workloads are flipping the coin to the other side of virtualization. As we see the diagram above, IDC brought this mindset change to get us to Think Composability, the next phase of Infrastructure.

Continue reading

Stating the case for a Storage Appliance approach

I was in Indonesia last week to meet with iXsystems™‘ partner PT Maha Data Solusi. I had the wonderful opportunity to meet with many people there and one interesting and often-replayed question arose. Why aren’t iX doing software-defined-storage (SDS)? It was a very obvious and deliberate question.

After all, iX is already providing the free use of the open source TrueNAS® CORE software that runs on many x86 systems as an SDS solution and yet commercially, iX sell the TrueNAS® storage appliances.

This argument between a storage appliance model and a storage storage only model has been debated for more than a decade, and it does come into my conversations on and off. I finally want to address this here, with my own views and opinions. And I want to inform that I am open to both models, because as a storage consultant, both have their pros and cons, advantages and disadvantages. Up front I gravitate to the storage appliance model, and here’s why.

My story of the storage appliance begins …

Back in the 90s, most of my work was on Fibre Channel and NFS. iSCSI has not existed yet (iSCSI was ratified in 2003). It was almost exclusively on the Sun Microsystems® enterprise storage with Sun’s software resell of the Veritas® software suite that included the Sun Volume Manager (VxVM), Veritas® Filesystem (VxFS), Veritas® Replication (VxVR) and Veritas® Cluster Server (VCS). I didn’t do much Veritas® NetBackup (NBU) although I was trained at Veritas® in Boston in July 1997 (I remembered that 2 weeks’ trip fondly). It was just over 2 months after Veritas® acquired OpenVision. Backup Plus was the NetBackup.

Between 1998-1999, I spent a lot of time working Sun NFS servers. The prevalent networking speed at that time was 100Mbits/sec. And I remember having this argument with a Sun partner engineer by the name of Wong Teck Seng. Teck Seng was an inquisitive fella (still is) and he was raving about this purpose-built NFS server he knew about and he shared his experience with me. I detracted him, brushing aside his always-on tech orgasm, and did not find great things about a NAS storage appliance. Auspex™ was big then, and I knew of them.

I joined NetApp® as Malaysia’s employee #2. It was an odd few months working with a storage appliance but after a couple of months, I started to understand and appreciate the philosophy. The storage Appliance Model made sense to me, even through these days.

Continue reading

All the Sources and Sinks going to Object Storage

The vocabulary of sources and sinks are beginning to appear in the world of data storage as we witness the new addition of data processing frameworks and the applications in this space. I wrote about this in my blog “Rethinking data. processing frameworks systems in real time” a few months ago, introducing my take on this budding new set of I/O characteristics and data ecosystem. I also started learning about the Kappa Architecture (and Lambda as well), a framework designed to craft and develop a set of amalgamated technologies to handle stream processing of a series of data in relation to time.

In Computer Science, sources and sinks are considered external entities that often serve as connectors of input and output of disparate systems. They are often not in the purview of data storage architects. Also often, these sources and sinks are viewed as black boxes, and their inner workings are hidden from the views of the data storage architects.

Diagram from https://developer.here.com/documentation/get-started/dev_guide/shared_content/topics/olp/concepts/pipelines.html

The changing facade of data stream processing presents the constant motion of data, the continuous data being altered as it passes through the many integrated sources and sinks. We are also see much of the data processed in-memory as much as possible. Thus, the data services from a traditional storage model of SAN and NAS may straggle with the requirements demanded by this new generation of data stream processing.

As the world of traditional data storage processing is expanding into data streams processing and vice versa, and the chatter of sources and sinks can no longer be ignored.

Continue reading

I built a 6-node Gluster cluster with TrueNAS SCALE

I haven’t had hands-on with Gluster for over a decade. My last blog about Gluster was in 2011, right after I did a proof-of-concept for the now defunct, Jaring, Malaysia’s first ISP (Internet Service Provider). But I followed Gluster’s development on and off, until I found out that Gluster was a feature in then upcoming TrueNAS® SCALE. That was almost 2 years ago, just before I accepted to offer to join iXsystems™, my present employer.

The eagerness to test drive Gluster (again) on TrueNAS® SCALE has always been there but I waited for SCALE to become GA. GA finally came on February 22, 2022. My plans for the test rig was laid out, and in the past few weeks, I have been diligently re-learning and putting up the scope to built a 6-node Gluster clustered storage with TrueNAS® SCALE VMs on Virtualbox®.

Gluster on OpenZFS with TrueNAS SCALE

Before we continue, I must warn that this is not pretty. I have limited computing resources in my homelab, but Gluster worked beautifully once I ironed out the inefficiencies. Secondly, this is not a performance test as well, for obvious reasons. So, this is the annals along with the trials and tribulations of my 6-node Gluster cluster test rig on TrueNAS® SCALE.

Continue reading

Exploring the venerable NFS Ganesha

As TrueNAS® SCALE approaches its General Availability date in less than 10 days time, one of the technology pieces I am extremely excited about in TrueNAS® SCALE is the NFS Ganesha server. It is still early days to see the full prowess of NFS Ganesha in TrueNAS® SCALE, but the potential of Ganesha’s capabilities in iXsystems™ new scale-out storage technology is very, very promising.

NFS Ganesha

I love Network File System (NFS). It was one of the main reasons I was so attracted to Sun Microsystems® SunOS in the first place. 6 months before I graduated, I took a Unix systems programming course in C in the university. The labs were on Sun 3/60 workstations. Coming from a background of a VAX/VMS system administrator in the school’s lab, Unix became a revelation for me. It completely (and blissfully) opened my eyes to open technology, and NFS was the main catalyst. Till this day, my devotion to Unix remained sacrosanct because of the NFS spark aeons ago.

I don’t know NFS Ganesha. I knew of its existence for almost a decade, but I have never used it. Most of the NFS daemons/servers I worked with were kernel NFS, and these included NFS services in Sun SunOS/Solaris, several Linux flavours – Red Hat®, SuSE®, Ubuntu, BSD variants in FreeBSD and MacOS, the older Unices of the 90s – HP-UX, Ultrix, AIX and Irix along with SCO Unix and Microsoft® XenixNetApp® ONTAP™, EMC® Isilon (very briefly), Hitachi® HNAS (née BlueArc) and of course, in these past 5-6 years FreeNAS®/TrueNAS™.

So, as TrueNAS® SCALE beckons, I took to this weekend to learn a bit about NFS Ganesha. Here are what I have learned.

Continue reading

A conceptual distributed enterprise HCI with open source software

Cloud computing has changed everything, at least at the infrastructure level. Kubernetes is changing everything as well, at the application level. Enterprises are attracted by tenets of cloud computing and thus, cloud adoption has escalated. But it does not have to be a zero-sum game. Hybrid computing can give enterprises a balanced choice, and they can take advantage of the best of both worlds.

Open Source has changed everything too because organizations now has a choice to balance their costs and expenditures with top enterprise-grade software. The challenge is what can organizations do to put these pieces together using open source software? Integration of open source infrastructure software and applications can be complex and costly.

The next version of HCI

Hyperconverged Infrastructure (HCI) also changed the game. Integration of compute, network and storage became easier, more seamless and less costly when HCI entered the market. Wrapped with a single control plane, the HCI management component can orchestrate VM (virtual machine) resources without much friction. That was HCI 1.0.

But HCI 1.0 was challenged, because several key components of its architecture were based on DAS (direct attached) storage. Scaling storage from a capacity point of view was limited by storage components attached to the HCI architecture. Some storage vendors decided to be creative and created dHCI (disaggregated HCI). If you break down the components one by one, in my opinion, dHCI is just a SAN (storage area network) to HCI. Maybe this should be HCI 1.5.

A new version of an HCI architecture is swimming in as Angelfish

Kubernetes came into the HCI picture in recent years. Without the weights and dependencies of VMs and DAS at the HCI server layer, lightweight containers orchestrated, mostly by, Kubernetes, made distribution of compute easier. From on-premises to cloud and in between, compute resources can easily spun up or down anywhere.

Continue reading

How well do you know your data and the storage platform that processes the data

Last week was consumed by many conversations on this topic. I was quite jaded, really. Unfortunately many still take a very simplistic view of all the storage technology, or should I say over-marketing of the storage technology. So much so that the end users make incredible assumptions of the benefits of a storage array or software defined storage platform or even cloud storage. And too often caveats of turning on a feature and tuning a configuration to the max are discarded or neglected. Regards for good storage and data management best practices? What’s that?

I share some of my thoughts handling conversations like these and try to set the right expectations rather than overhype a feature or a function in the data storage services.

Complex data networks and the storage services that serve it

I/O Characteristics

Applications and workloads (A&W) read and write from the data storage services platforms. These could be local DAS (direct access storage), network storage arrays in SAN and NAS, and now objects, or from cloud storage services. Regardless of structured or unstructured data, different A&Ws have different behavioural I/O patterns in accessing data from storage. Therefore storage has to be configured at best to match these patterns, so that it can perform optimally for these A&Ws. Without going into deep details, here are a few to think about:

  • Random and Sequential patterns
  • Block sizes of these A&Ws ranging from typically 4K to 1024K.
  • Causal effects of synchronous and asynchronous I/Os to and from the storage

Continue reading

Storage Elephant Compute Birds

Data movement is expensive. Not just costs, but also latency and resources as well. Thus there were many narratives to move compute closer to where the data is stored because moving compute is definitely more economical than moving data. I borrowed the analogy of the 2 animals from some old NetApp® slides which depicted storage as the elephant, and compute as birds. It was the perfect analogy, because the storage is heavy and compute is light.

“Close up of a white Great Egret perching on top of an African Elephant aa Amboseli national park, Kenya”

Before the animals representation came about I used to use the term “Data locality, Data Mobility“, because of past work on storage technology in the Oil & Gas subsurface data management pipeline.

Take stock of your data movement

I had recent conversations with an end user who has been paying a lot of dollars keeping their “backup” and “archive” in AWS Glacier. The S3 storage is cheap enough to hold several petabytes of data for years, because the IT folks said that the data in AWS Glacier are for “backup” and “archive”. I put both words in quotes because they were termed as “backup” and “archive” because of their enterprise practice. However, the face of their business is changing. They are in manufacturing, oil and gas downstream, and the definitions of “backup” and “archive” data has changed.

For one, there is a strong demand for reusing the past data for various reasons and these datasets have to be recalled from their cloud storage. Secondly, their data movement activities still mimicked what they did in the past during their enterprise storage days. It was a classic lift-and-shift when they moved to the cloud, and not taking stock of  their data movements and the operations they ran on these datasets. Still ongoing, their monthly AWS cost a bomb.

Continue reading

The future of Fibre Channel in the Cloud Era

The world has pretty much settled that hybrid cloud is the way to go for IT infrastructure services today. Straddled between the enterprise data center and the infrastructure-as-a-service in public cloud offerings, hybrid clouds define the storage ecosystems and architecture of choice.

A recent Blocks & Files article, “Broadcom server-storage connectivity sales down but recovery coming” caught my attention. One segment mentioned that the server-storage connectivity sales was down 9% leading me to think “Is this a blip or is it a signal that Fibre Channel, the venerable SAN (storage area network) protocol is on the wane?

Fibre Channel Sign

Thus, I am pondering the position of Fibre Channel SANs in the cloud era. Where does it stand now and in the near future? Continue reading