I built a 6-node Gluster cluster with TrueNAS SCALE

I haven’t had hands-on with Gluster for over a decade. My last blog about Gluster was in 2011, right after I did a proof-of-concept for the now defunct, Jaring, Malaysia’s first ISP (Internet Service Provider). But I followed Gluster’s development on and off, until I found out that Gluster was a feature in then upcoming TrueNAS® SCALE. That was almost 2 years ago, just before I accepted to offer to join iXsystems™, my present employer.

The eagerness to test drive Gluster (again) on TrueNAS® SCALE has always been there but I waited for SCALE to become GA. GA finally came on February 22, 2022. My plans for the test rig was laid out, and in the past few weeks, I have been diligently re-learning and putting up the scope to built a 6-node Gluster clustered storage with TrueNAS® SCALE VMs on Virtualbox®.

Gluster on OpenZFS with TrueNAS SCALE

Before we continue, I must warn that this is not pretty. I have limited computing resources in my homelab, but Gluster worked beautifully once I ironed out the inefficiencies. Secondly, this is not a performance test as well, for obvious reasons. So, this is the annals along with the trials and tribulations of my 6-node Gluster cluster test rig on TrueNAS® SCALE.

Continue reading

Storage IO straight to GPU

The parallel processing power of the GPU (Graphics Processing Unit) cannot be denied. One year ago, nVidia® overtook Intel® in market capitalization. And today, they have doubled their market cap lead over Intel®,  [as of July 2, 2021] USD$510.53 billion vs USD$229.19 billion.

Thus it is not surprising that storage architectures are changing from the CPU-centric paradigm to take advantage of the burgeoning prowess of the GPU. And 2 announcements in the storage news in recent weeks have caught my attention – Windows 11 DirectStorage API and nVidia® Magnum IO GPUDirect® Storage.

nVidia GPU

Exciting the gamers

The Windows DirectStorage API feature is only available in Windows 11. It was announced as part of the Xbox® Velocity Architecture last year to take advantage of the high I/O capability of modern day NVMe SSDs. DirectStorage-enabled applications and games have several technologies such as D3D Direct3D decompression/compression algorithm designed for the GPU, and SFS Sampler Feedback Streaming that uses the previous rendered frame results to decide which higher resolution texture frames to be loaded into memory of the GPU and rendered for the real-time gaming experience.

Continue reading

Is Software Defined right for Storage?

George Herbert Leigh Mallory, mountaineer extraordinaire, was once asked “Why did you want to climb Mount Everest?“, in which he replied “Because it’s there“. That retort demonstrated the indomitable human spirit and probably exemplified best the relationship between the human being’s desire to conquer the physical limits of nature. The software of humanity versus the hardware of the planet Earth.

Juxtaposing, similarities can be said between software and hardware in computer systems, in storage technology per se. In it, there are a few schools of thoughts when it comes to delivering storage services with the notable ones being the storage appliance model and the software-defined storage model.

There are arguments, of course. Some are genuinely partisan but many a times, these arguments come in the form of the flavour of the moment. I have experienced in my past companies touting the storage appliance model very strongly in the beginning, and only to be switching to a “software company” chorus years after that. That was what I meant about the “flavour of the moment”.

Software Defined Storage

Continue reading

A Paean to NFS

It is certainly encouraging to see both NAS protocols, NFS and SMB, featured well in the latest VMware® vSAN 7 Update 1 release. The NFS v3 and v4.1 support was already in vSAN 7.0 when it was earlier announced as part of its Native File Services for vSAN. But some years ago, NFS was not always the primary storage protocol of choice. SAN protocols, Fibre Channel and iSCSI, were almost always designated to serve enterprise applications. At the client side, Windows became prominent, and the SMB/CIFS protocol dominated the landscape of the desktop. This further pushed NFS into the back closet.

NFS or Network File System has its naysayers. The venerable, but often maligned distributed network file protocol is 36 years today. In storage vendors such as NetApp®, VAST Data, Pure Storage FlashBlade, and Dell EMC Isilon, NFS is still positioned as the primary file protocol for manufacturing testers on the shop floor, EDA/eCAD applications, seismic and subsurface applications in Oil & Gas and many more. In another development, just like its presence in the vSAN Native Services,, NFS has also quietly embedded itself into many storage platforms to serve the data platform services within the respective framework itself.

And I have experienced NFS from the client side to the enterprise applications and more, and I take this opportunity to pay tribute.

NFS (Network File System) client server network

NFS (Network File System) client server network

Continue reading

Open Source and Open Standards open the Future

[Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies to be presented at this event. The content of this blog is of my own opinions and views]

Western Digital dived into Storage Field Day 19 in full force as they did in Storage Field Day 18. A series of high impact presentations, each curated for the diverse requirements of the audience. Several open source initiatives were shared, all open standards to address present inefficiencies and designed and developed for a greater future.

Zoned Storage

One of the initiatives is to increase the efficiencies around SMR and SSD zoning capabilities and removing the complexities and overlaps of both mediums. This is the Zoned Storage initiatives a technical working proposal to the existing NVMe standards. The resulting outcome will give applications in the user space more control on the placement of data blocks on zone aware devices and zoned SSDs, collectively as Zoned Block Device (ZBD). The implementation in the Linux user and kernel space is shown below:

Continue reading

Storage Performance Considerations for AI Data Paths

The hype of Deep Learning (DL), Machine Learning (ML) and Artificial Intelligence (AI) has reached an unprecedented frenzy. Every infrastructure vendor from servers, to networking, to storage has a word to say or play about DL/ML/AI. This prompted me to explore this hyped ecosystem from a storage perspective, notably from a storage performance requirement point-of-view.

One question on my mind

There are plenty of questions on my mind. One stood out and that is related to storage performance requirements.

Reading and learning from one storage technology vendor to another, the context of everyone’s play against their competitors seems to be  “They are archaic, they are legacy. Our architecture is built from ground up, modern, NVMe-enabled“. And there are more juxtaposing, but you get the picture – “We are better, no doubt“.

Are the data patterns and behaviours of AI different? How do they affect the storage design as the data moves through the workflow, the data paths and the lifecycle of the AI ecosystem?

Continue reading

Scaling new HPC with Composable Architecture

[Disclosure: I was invited by Dell Technologies as a delegate to their Dell Technologies World 2019 Conference from Apr 29-May 1, 2019 in the Las Vegas USA. Tech Field Day Extra was an included activity as part of the Dell Technologies World. My expenses, travel, accommodation and conference fees were covered by Dell Technologies, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

Deep Learning, Neural Networks, Machine Learning and subsequently Artificial Intelligence (AI) are the new generation of applications and workloads to the commercial HPC systems. Different from the traditional, more scientific and engineering HPC workloads, I have written about the new dawn of supercomputing and the attractive posture of commercial HPC.

Don’t be idle

From the business perspective, the investment of HPC systems is high most of the time, and justifying it to the executives and the investors is not easy. Therefore, it is critical to keep feeding the HPC systems and significantly minimize the idle times for compute, GPUs, network and storage.

However, almost all HPC systems today are inflexible. Once assigned to a project, the resources pretty much stay with the project, even when the workload processing of the project is idle and waiting. Of course, we have to bear in mind that not all resources are fully abstracted, virtualized and software-defined whereby you can carve out pieces of the hardware and deliver a percentage of that resource. Case in point is the CPU, where you cannot assign certain clock cycles of CPU to one project and another half to the other. The technology isn’t there yet. Certain resources like GPU is going down the path of Virtual GPU, and into the realm of resource disaggregation. Eventually, all resources of the HPC systems – CPU, memory, FPGA, GPU, PCIe channels, NVMe paths, IOPS, bandwidth, burst buffers etc – should be disaggregated and pooled for disparate applications and workloads based on demands of usage, time and performance.

Hence we are beginning to see the disaggregated HPC systems resources composed and built up the meet the diverse mix and needs of HPC applications and workloads. This is even more acute when a AI project might grow cold, but the training of AL/ML/DL workloads continues to stay hot

Liqid the early leader in Composable Architecture

Continue reading

WekaIO controls their performance destiny

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

I was first introduced to WekaIO back in Storage Field Day 15. I did not blog about them back then, but I have followed their progress quite attentively throughout 2018. 2 Storage Field Days and a year later, they were back for Storage Field Day 18 with a new CTO, Andy Watson, and several performance benchmark records.

Blowout year

2018 was a blowout year for WekaIO. They have experienced over 400% growth, placed #1 in the Virtual Institute IO-500 10-node performance challenge, and also became #1 in the SPEC SFS 2014 performance and latency benchmark. (Note: This record was broken by NetApp a few days later but at a higher cost per client)

The Virtual Institute for I/O IO-500 10-node performance challenge was particularly interesting, because it pitted WekaIO against Oak Ridge National Lab (ORNL) Summit supercomputer, and WekaIO won. Details of the challenge were listed in Blocks and Files and WekaIO Matrix Filesystem became the fastest parallel file system in the world to date.

Control, control and control

I studied WekaIO’s architecture prior to this Field Day. And I spent quite a bit of time digesting and understanding their data paths, I/O paths and control paths, in particular, the diagram below:

Starting from the top right corner of the diagram, applications on the Linux client (running Weka Client software) and it presents to the Linux client as a POSIX-compliant file system. Through the network, the Linux client interacts with the WekaIO kernel-based VFS (virtual file system) driver which coordinates the Front End (grey box in upper right corner) to the Linux client. Other client-based protocols such as NFS, SMB, S3 and HDFS are also supported. The Front End then interacts with the NIC (which can be 10/100G Ethernet, Infiniband, and NVMeoF) through SR-IOV (single root IO virtualization), bypassing the Linux kernel for maximum throughput. This is with WekaIO’s own networking stack in user space. Continue reading

Sexy HPC storage is all the rage

HPC is sexy

There is no denying it. HPC is sexy. HPC Storage is just as sexy.

Looking at the latest buzz from Super Computing Conference 2018 which happened in Dallas 2 weeks ago, the number of storage related vendors participating was staggering. Panasas, Weka.io, Excelero, BeeGFS, are the ones that I know because I got friends posting their highlights. Then there are the perennial vendors like IBM, Dell, HPE, NetApp, Huawei, Supermicro, and so many more. A quick check on the SC18 website showed that there were 391 exhibitors on the floor.

And this is driven by the unrelentless demand for higher and higher performance of computing, and along with it, the demands for faster and faster storage performance. Commercialization of Artificial Intelligence (AI), Deep Learning (DL) and newer applications and workloads together with the traditional HPC workloads are driving these ever increasing requirements. However, most enterprise storage platforms were not designed to meet the demands of these new generation of applications and workloads, as many have been led to believe. Why so?

I had a couple of conversations with a few well known vendors around the topic of HPC Storage. And several responses thrown back were to put Flash and NVMe to solve the high demands of HPC storage performance. In my mind, these responses were too trivial, too irresponsible. So I wanted to write this blog to share my views on HPC storage, and not just about its performance.

The HPC lines are blurring

I picked up this video (below) a few days ago. It was insideHPC Rich Brueckner interview with Dr. Goh Eng Lim, HPE CTO and renowned HPC expert about the convergence of both traditional and commercial HPC applications and workloads.

I liked the conversation in the video because it addressed the 2 different approaches. And I welcomed Dr. Goh’s invitation to the Commercial HPC community to work with the Traditional HPC vendors to help push the envelope towards Exascale SuperComputing.

Continue reading