Open Source Storage Technology Crafters

The conversation often starts with a challenge. “What’s so great about open source storage technology?

For the casual end users of storage systems, regardless of SAN (definitely not Fibre Channel) or NAS on-premises, or getting “files” from the personal cloud storage like Dropbox, OneDrive et al., there is a strong presumption that open source storage technology is cheap and flaky. This is not helped with the diet of consumer brands of NAS in the market, where the price is cheap, but the storage offering with capabilities, reliability and performance are found to be wanting. Thus this notion floats its way to the business and enterprise users, and often ended up with a negative perception of open source storage technology.

Highway Signpost with Open Source wording

Storage Assemblers

Anybody can “build” a storage system with open source storage software. Put the software together with any commodity x86 server, and it can function with the basic storage services. Most open source storage software can do the job pretty well. However, once the completed storage technology is put together, can it do the job well enough to serve a business critical end user? I have plenty of sob stories from end users I have spoken to in these many years in the industry related to so-called “enterprise” storage vendors. I wrote a few blogs in the past that related to these sad situations:

We have such storage offerings rigged with cybersecurity risks and holes too. In a recent Unit 42 report, 250,000 NAS devices are vulnerable and exposed to the public Internet. The brands in question are mentioned in the report.

I would categorize these as storage assemblers.

Continue reading

Is Software Defined right for Storage?

George Herbert Leigh Mallory, mountaineer extraordinaire, was once asked “Why did you want to climb Mount Everest?“, in which he replied “Because it’s there“. That retort demonstrated the indomitable human spirit and probably exemplified best the relationship between the human being’s desire to conquer the physical limits of nature. The software of humanity versus the hardware of the planet Earth.

Juxtaposing, similarities can be said between software and hardware in computer systems, in storage technology per se. In it, there are a few schools of thoughts when it comes to delivering storage services with the notable ones being the storage appliance model and the software-defined storage model.

There are arguments, of course. Some are genuinely partisan but many a times, these arguments come in the form of the flavour of the moment. I have experienced in my past companies touting the storage appliance model very strongly in the beginning, and only to be switching to a “software company” chorus years after that. That was what I meant about the “flavour of the moment”.

Software Defined Storage

Continue reading

Layers in Storage – For better or worse

Storage arrays and storage services are built upon by layers and layers beneath its architecture. The physical components of hard disk drives and solid states are abstracted into RAID volumes, virtualized into other storage constructs before they are exposed as shares/exports, LUNs or objects to the network.

Everyone in the storage networking industry, is cognizant of the layers and it is the foundation of knowledge and experience. The public cloud storage services side is the same, albeit more opaque. Nevertheless, both have layers.

In the early 2000s, SNIA® Technical Council outlined a blueprint of the SNIA® Shared Storage Model, a framework describing layers and properties of a storage system and its services. It was similar to the OSI 7-layer model for networking. The framework helped many industry professionals and practitioners shaped their understanding and the development of knowledge in their respective fields. The layering scheme of the SNIA® Shared Storage Model is shown below:

SNIA Shared Storage Model – The layering scheme

Storage vendors layering scheme

While SNIA® storage layers were generic and open, each storage vendor had their own proprietary implementation of storage layers. Some of these architectures are simple, but some, I find a bit too complex and convoluted.

Here is an example of the layers of the Automated Volume Management (AVM) architecture of the EMC® Celerra®.

EMC Celerra AVM Layering Scheme

I would often scratch my head about AVM. Disks were grouped into RAID groups, which are LUNs (Logical Unit Numbers). Then they were defined as Celerra® dvols (disk volumes), and stripes of the dvols were consolidated into a storage pool.

From the pool, a piece of a storage capacity construct, called a slice volume, were combined with other slice volumes into a metavolume which eventually was presented as a file system to the network and their respective NAS clients. Explaining this took an effort because I was the IP Storage product manager for EMC® between 2007 – 2009. It was a far cry from the simplicity of NetApp® ONTAP 7 architecture of RAID groups and volumes, and the WAFL® (Write Anywhere File Layout) filesystem.

Another complicated layered framework I often gripe about is Ceph. Here is a look of how the layers of CephFS is constructed.

Ceph Storage Layered Framework

I work with the OpenZFS filesystem a lot. It is something I am rather familiar with, and the layered structure of the ZFS filesystem is essentially simpler.

Storage architecture mixology

Engineers are bizarre when they get too creative. They have a can do attitude that transcends the boundaries of practicality sometimes, and boggles many minds. This is what happens when they have their own mixology ideas.

Recently I spoke to two magnanimous persons who had the idea of providing Ceph iSCSI LUNs to the ZFS filesystem in order to use the simplicity of NAS file sharing capabilities in TrueNAS® CORE. From their own words, Ceph NAS capabilities sucked. I had to draw their whole idea out in a Powerpoint and this is the architecture I got from the conversation.

There are 3 different storage subsystems here just to provide NAS. As if Ceph layers aren’t complicated enough, the iSCSI LUNs from Ceph are presented as Cinder volumes to the KVM hypervisor (or VMware® ESXi) through the Cinder driver. Cinder is the persistent storage volume subsystem of the Openstack® project. The Cinder volumes/hypervisor datastore are virtualized as vdisks to the respective VMs installed with TrueNAS® CORE and OpenZFS filesystem. From the TrueNAS® CORE, shares and exports are provisioned via the SMB and NFS protocols to Windows and Linux respectively.

It works! As I was told, it worked!

A.P.P.A.R.M.S.C. considerations

Continuing from the layered framework described above for NAS, other aspects beside the technical work have to be considered, even when it can work technically.

I often use a set of diligent data storage focal points when considering a good storage design and implementation. This is the A.P.P.A.R.M.S.C. Take for instance Protection as one of the points and snapshot is the technology to use.

Snapshots can be executed at the ZFS level on the TrueNAS® CORE subsystem. Snapshots can be trigged at the volume level in Openstack® subsystem and likewise, rbd snapshots at the Ceph subsystem. The question is, which snapshot at which storage subsystem is the most valuable to the operations and business? Do you run all 3 snapshots? How do you execute them in succession in a scheduled policy?

In terms of performance, can it truly maximize its potential? Can it churn out the best IOPS, and deliver at wire speed? What is the latency we can expect with so many layers from 3 different storage subsystems?

And supporting this said architecture would be a nightmare. Where do you even start the troubleshooting?

Those are just a few considerations and questions to think about when such a layered storage architecture along. IMHO, such a design was over-engineered. I was tempted to say “Just because you can, doesn’t mean you should

Elegance in Simplicity

Einstein (I think) quoted:

Einstein’s quote on simplicity and complexity

I am not saying that having too many layers is wrong. Having a heavily layered architecture works for many storage solutions out there, where they are often masked with a simple and intuitive UI. But in yours truly point of view, as a storage architecture enthusiast and connoisseur, there is beauty and elegance in simple designs.

The purpose here is to promote better understanding of the storage layers, and how they integrate and interact with each other to deliver the data services to the network. In the end, that is how most storage architectures are built.

 

A Paean to NFS

It is certainly encouraging to see both NAS protocols, NFS and SMB, featured well in the latest VMware® vSAN 7 Update 1 release. The NFS v3 and v4.1 support was already in vSAN 7.0 when it was earlier announced as part of its Native File Services for vSAN. But some years ago, NFS was not always the primary storage protocol of choice. SAN protocols, Fibre Channel and iSCSI, were almost always designated to serve enterprise applications. At the client side, Windows became prominent, and the SMB/CIFS protocol dominated the landscape of the desktop. This further pushed NFS into the back closet.

NFS or Network File System has its naysayers. The venerable, but often maligned distributed network file protocol is 36 years today. In storage vendors such as NetApp®, VAST Data, Pure Storage FlashBlade, and Dell EMC Isilon, NFS is still positioned as the primary file protocol for manufacturing testers on the shop floor, EDA/eCAD applications, seismic and subsurface applications in Oil & Gas and many more. In another development, just like its presence in the vSAN Native Services,, NFS has also quietly embedded itself into many storage platforms to serve the data platform services within the respective framework itself.

And I have experienced NFS from the client side to the enterprise applications and more, and I take this opportunity to pay tribute.

NFS (Network File System) client server network

NFS (Network File System) client server network

Continue reading

Give back or no give

[ Disclosure: I work for iXsystems™ Inc. Views and opinions are my own. ]

If my memory served me right, I recalled the illustrious leader of the Illumos project, Garrett D’Amore ranting about companies, big and small, taking OpenZFS open source codes and projects to incorporate into their own technology but hardly ever giving back to the open source community. That was almost 6 years ago.

My thoughts immediately go back to the days when open source was starting to take off back in the early 2000s. Oracle 9i database had just embraced Linux in a big way, and the book by Eric S. Raymond, “The Cathedral and The Bazaar” was a big hit.

The Cathedral & The Bazaar by Eric S. Raymond

Since then, the blooming days of proprietary software world began to wilt, and over the next twenty plus year, open source software has pretty much taken over the world. Even Microsoft®, the ruthless ruler of the Evil Empire caved in to some of the open source calls. The Microsoft® “I Love Linux” embrace definitely gave the victory feeling of the Rebellion win over the Empire. Open Source won.

Open Source bag of worms

Even with the concerted efforts of the open source communities and projects, there were many situations which have caused frictions and inadvertently, major issues as well. There are several open source projects licenses, and they are not always compatible when different open source projects mesh together for the greater good.

On the storage side of things, 2 “incidents” caught the attention of the masses. For instance, Linus Torvalds, Linux BDFL (Benevolent Dictator for Life) and emperor supremo said “Don’t use ZFS” partly due to the ignorance and incompatibility of Linux GPL (General Public License) and ZFS CDDL (Common Development and Distribution License). That ruffled some feathers amongst the OpenZFS community that Matt Ahrens, the co-creator of the ZFS file system and OpenZFS community leader had to defend OpenZFS from Linus’ comments.

Continue reading

The instant value of Open Source Storage

[ Full disclosure: I work for iXsystems™ . Opinions and views are mine. ]

TrueNAS Open Storage logo

The story began …

It was 2011. A friend of a friend called me out of the blue. He was rambling about his company’s storage needs. I recalled vividly that he wanted 100TB, and Dell and HP (before HPE) were hopeless doing NAS (network attached storage) in an Apple environment. They assembled a Frankenstein-ish NAS and plastered a price over MYR$100K around it.

In his environment, the Apple workstations were connected to dozens of WD Cloud Book storage (whatever it was called back then), daisy chained via Firewire to each other. I recalled one workstation had 3 WD “books” daisy chained together. They got the exploding storage needs but performance sucked. With every 2nd or 3rd user, access to files were at a snail pace, taking up to more than 2 minutes to open a file sometimes.

At that time, my old colleague at Sun was fervently talking about ZFS and OpenSolaris™. I told him about this opportunity, and so we began. It was him who used the word “crafter”. “We are not building“, he said, “we are crafting“. He was right.

OpenSolaris logo

Continue reading

Down the rabbit hole with Kubernetes Storage

Kubernetes is on fire. Last week VMware® released the State of Kubernetes 2020 report which surveyed companies with 1,000 employees and above. Results were not surprising as the adoptions of this nascent technology are booming. But persistent storage remained the nagging concern for the Kubernetes serving the infrastructure resources to applications instances running in the containers of a pod in a cluster.

The standardization of storage resources have settled with CSI (Container Storage Interface). Storage vendors have almost, kind of, sort of agreed that the API objects such as PersistentVolumes, PersistentVolumeClaims, StorageClasses, along with the parameters would be the way to request the storage resources from the Pre-provisioned Volumes via the CSI driver plug-in. There are already more than 50 vendor specific CSI drivers in Github.

Kubernetes and CSI initiative

Kubernetes and the CSI (Container Storage Interface) logos

The CSI plug-in method is the only way for Kubernetes to scale and keep its dynamic, loadable storage resource integration with external 3rd party vendors, all clamouring to grab a piece of this burgeoning demands both in the cloud and in the enterprise.

Continue reading

Paradigm shift of Dev to Storage Ops

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at the event. The content of this blog is of my own opinions and views ]

A funny photo (below) came up on my Facebook feed a couple of weeks back. In an honest way, it depicted how a developer would think (or the lack of thinking) about the storage infrastructure designs and models for the applications and workloads. This also reminded me of how DBAs used to diss storage engineers. “I don’t care about storage, as long as it is RAID 10“. That was aeons ago 😉

The world of developers and the world of infrastructure people are vastly different. Since cloud computing birthed, both worlds have collided and programmable infrastructure-as-code (IAC) have become part and parcel of cloud native applications. Of course, there is no denying that there is friction.

Welcome to DevOps!

The Kubernetes factor

Containerized applications are quickly defining the cloud native applications landscape. The container orchestration machinery has one dominant engine – Kubernetes.

In the world of software development and delivery, DevOps has taken a liking to containers. Containers make it easier to host and manage life-cycle of web applications inside the portable environment. It packages up application code other dependencies into building blocks to deliver consistency, efficiency, and productivity. To scale to a multi-applications, multi-cloud with th0usands and even tens of thousands of microservices in containers, the Kubernetes factor comes into play. Kubernetes handles tasks like auto-scaling, rolling deployment, computer resource, volume storage and much, much more, and it is designed to run on bare metal, in the data center, public cloud or even a hybrid cloud.

Continue reading

Is General Purpose Object Storage disenfranchised?

[Disclosure: I am invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees will be covered by GestaltIT, the organizer and I am not obligated to blog or promote the vendors’ technologies to be presented at this event. The content of this blog is of my own opinions and views]

This is NOT an advertisement for coloured balls.

This is the license to brag for the vendors in the next 2 weeks or so, as we approach the 2020 new year. This, of course, is the latest 2019 IDC Marketscape for Object-based Storage, released last week.

My object storage mentions

I have written extensively about Object Storage since 2011. With different angles and perspectives, here are some of them:

Continue reading