Object Storage becoming storage lingua franca of Edge-Core-Cloud

Data Fabric was a big buzzword going back several years. I wrote a piece talking about Data Fabric, mostly NetApp®’s,  almost 7 years ago, which I titled “The Transcendence of Data Fabric“. Regardless of storage brands and technology platforms, and each has its own version and interpretations, one thing holds true. There must be a one layer of Data Singularity. But this is easier said than done.

Fast forward to present. The latest buzzword is Edge-to-Core-Cloud or Cloud-to-Core-Edge. The proliferation of Cloud Computing services, has spawned beyond to multiclouds, superclouds and of course, to Edge Computing. Data is reaching to so many premises everywhere, and like water, data has found its way.

Edge-to-Core-to-Cloud (Gratitude thanks to https://www.techtalkthai.com/dell-technologies-opens-iot-solutions-division-and-introduces-distributed-core-architecture/)

The question on my mind is can we have a single storage platform to serve the Edge-to-Core-to-Cloud paradigm? Is there a storage technology which can be the seamless singularity of data? 7+ years onwards since my Data Fabric blog, The answer is obvious. Object Storage.

The ubiquitous object storage and the S3 access protocol

For a storage technology that was initially labeled “cheap and deep”, object storage has become immensely popular with developers, cloud storage providers and is fast becoming storage repositories for data connectors. I wrote a piece called “All the Sources and Sinks going to Object Storage” over a month back, which aptly articulate how far this technology has come.

But unknown to many (Google NASD and little is found), object storage started its presence in SNIA (it was developed in Carnegie-Mellon University prior to that) in the early 90s, then known as NASD (network attached secure disk). As it is made its way into the ANSI T10 INCITS standards development, it became known as Object-based Storage Device or OSD.

The introduction of object storage services 16+ years ago by Amazon Web Services (AWS) via their Simple Storage Services (S3) further strengthened the march of object storage, solidified its status as a top tier storage platform. It was to AWS’ genius to put the REST API over HTTP/HTTPS with its game changing approach to use CRUD (create, retrieve, update, delete) operations to work with object storage. Hence the S3 protocol, which has become the de facto access protocol to object storage.

Yes, I wrote those 2 blogs 11 and 9 years ago respectively because I saw that object storage technology was a natural fit to the burgeoning new world of storage computing. It has since come true many times over.

Continue reading

Building Trust in the Storage Brand

Trust is everything. When done right, the brand is trust.

One Wikibon article last month “Does Hardware (still) Matter?” touched on my sentiments and hit close to the heart. As the world becomes more and more data driven and cloud-centric, the prominence of IT infrastructure has diminished from the purview of the boardroom. The importance of IT infrastructure cannot be discounted but in this new age, storage infrastructure has become invisible.

In the seas of both on-premises and hybrid storage technology solutions, everyone is trying to stand out, trying to eke the minutest ounces of differentiation and advantage to gain the customer’s micro-attention. With all the drum beatings, the loyalty of the customer can switch in an instance unless we build trust.

I ponder a few storage industry variables that help build trust.

Open source Communities and tribes

During the hey-days of proprietary software and OSes, protectionism was key to guarding the differentiations and the advantages. Licenses were common, and some were paired with the hardware hostid to create that “power combination”. And who can forget those serial dongles license keys? Urgh!!

Since the open source movement (Read The Cathedral and the Bazaar publication) began, the IT world has begun to trust software and OSes more and more. Open Source communities grew and technology tribes were formed in all types of niches, including storage software. Trust grew because the population of the communities kept the vendors honest. Gone are the days of the Evil Empire. Even Microsoft® became a ‘cool kid’.

TRUST

One open source storage filesystem I worked extensively on is OpenZFS. From its beginnings after Open Solaris® (remember build 134), becoming part of the Illumos project and then later in FreeBSD® and Linux upstream. Trust in OpenZFS was developed over time because of the open source model. It has spawned many storage projects including FreeNAS™ which later became TrueNAS®.

Continue reading

I built a 6-node Gluster cluster with TrueNAS SCALE

I haven’t had hands-on with Gluster for over a decade. My last blog about Gluster was in 2011, right after I did a proof-of-concept for the now defunct, Jaring, Malaysia’s first ISP (Internet Service Provider). But I followed Gluster’s development on and off, until I found out that Gluster was a feature in then upcoming TrueNAS® SCALE. That was almost 2 years ago, just before I accepted to offer to join iXsystems™, my present employer.

The eagerness to test drive Gluster (again) on TrueNAS® SCALE has always been there but I waited for SCALE to become GA. GA finally came on February 22, 2022. My plans for the test rig was laid out, and in the past few weeks, I have been diligently re-learning and putting up the scope to built a 6-node Gluster clustered storage with TrueNAS® SCALE VMs on Virtualbox®.

Gluster on OpenZFS with TrueNAS SCALE

Before we continue, I must warn that this is not pretty. I have limited computing resources in my homelab, but Gluster worked beautifully once I ironed out the inefficiencies. Secondly, this is not a performance test as well, for obvious reasons. So, this is the annals along with the trials and tribulations of my 6-node Gluster cluster test rig on TrueNAS® SCALE.

Continue reading

Exploring the venerable NFS Ganesha

As TrueNAS® SCALE approaches its General Availability date in less than 10 days time, one of the technology pieces I am extremely excited about in TrueNAS® SCALE is the NFS Ganesha server. It is still early days to see the full prowess of NFS Ganesha in TrueNAS® SCALE, but the potential of Ganesha’s capabilities in iXsystems™ new scale-out storage technology is very, very promising.

NFS Ganesha

I love Network File System (NFS). It was one of the main reasons I was so attracted to Sun Microsystems® SunOS in the first place. 6 months before I graduated, I took a Unix systems programming course in C in the university. The labs were on Sun 3/60 workstations. Coming from a background of a VAX/VMS system administrator in the school’s lab, Unix became a revelation for me. It completely (and blissfully) opened my eyes to open technology, and NFS was the main catalyst. Till this day, my devotion to Unix remained sacrosanct because of the NFS spark aeons ago.

I don’t know NFS Ganesha. I knew of its existence for almost a decade, but I have never used it. Most of the NFS daemons/servers I worked with were kernel NFS, and these included NFS services in Sun SunOS/Solaris, several Linux flavours – Red Hat®, SuSE®, Ubuntu, BSD variants in FreeBSD and MacOS, the older Unices of the 90s – HP-UX, Ultrix, AIX and Irix along with SCO Unix and Microsoft® XenixNetApp® ONTAP™, EMC® Isilon (very briefly), Hitachi® HNAS (née BlueArc) and of course, in these past 5-6 years FreeNAS®/TrueNAS™.

So, as TrueNAS® SCALE beckons, I took to this weekend to learn a bit about NFS Ganesha. Here are what I have learned.

Continue reading

A conceptual distributed enterprise HCI with open source software

Cloud computing has changed everything, at least at the infrastructure level. Kubernetes is changing everything as well, at the application level. Enterprises are attracted by tenets of cloud computing and thus, cloud adoption has escalated. But it does not have to be a zero-sum game. Hybrid computing can give enterprises a balanced choice, and they can take advantage of the best of both worlds.

Open Source has changed everything too because organizations now has a choice to balance their costs and expenditures with top enterprise-grade software. The challenge is what can organizations do to put these pieces together using open source software? Integration of open source infrastructure software and applications can be complex and costly.

The next version of HCI

Hyperconverged Infrastructure (HCI) also changed the game. Integration of compute, network and storage became easier, more seamless and less costly when HCI entered the market. Wrapped with a single control plane, the HCI management component can orchestrate VM (virtual machine) resources without much friction. That was HCI 1.0.

But HCI 1.0 was challenged, because several key components of its architecture were based on DAS (direct attached) storage. Scaling storage from a capacity point of view was limited by storage components attached to the HCI architecture. Some storage vendors decided to be creative and created dHCI (disaggregated HCI). If you break down the components one by one, in my opinion, dHCI is just a SAN (storage area network) to HCI. Maybe this should be HCI 1.5.

A new version of an HCI architecture is swimming in as Angelfish

Kubernetes came into the HCI picture in recent years. Without the weights and dependencies of VMs and DAS at the HCI server layer, lightweight containers orchestrated, mostly by, Kubernetes, made distribution of compute easier. From on-premises to cloud and in between, compute resources can easily spun up or down anywhere.

Continue reading

How well do you know your data and the storage platform that processes the data

Last week was consumed by many conversations on this topic. I was quite jaded, really. Unfortunately many still take a very simplistic view of all the storage technology, or should I say over-marketing of the storage technology. So much so that the end users make incredible assumptions of the benefits of a storage array or software defined storage platform or even cloud storage. And too often caveats of turning on a feature and tuning a configuration to the max are discarded or neglected. Regards for good storage and data management best practices? What’s that?

I share some of my thoughts handling conversations like these and try to set the right expectations rather than overhype a feature or a function in the data storage services.

Complex data networks and the storage services that serve it

I/O Characteristics

Applications and workloads (A&W) read and write from the data storage services platforms. These could be local DAS (direct access storage), network storage arrays in SAN and NAS, and now objects, or from cloud storage services. Regardless of structured or unstructured data, different A&Ws have different behavioural I/O patterns in accessing data from storage. Therefore storage has to be configured at best to match these patterns, so that it can perform optimally for these A&Ws. Without going into deep details, here are a few to think about:

  • Random and Sequential patterns
  • Block sizes of these A&Ws ranging from typically 4K to 1024K.
  • Causal effects of synchronous and asynchronous I/Os to and from the storage

Continue reading

OpenZFS with Object Storage

At AWS re:Invent last week, Amazon Web Services announced Amazon FSx for OpenZFS. This is the 4th managed service under the Amazon FSx umbrella, joining NetApp® ONTAP™, Lustre and Windows File Server. The highly scalable OpenZFS filesystem can provide high throughput and IOPS bandwidth to Amazon EC2, ECS, EKS and VMware® Cloud on AWS.

I am assuming the AWS OpenZFS uses EBS as the block storage backend, given the announcement that it can deliver 4GB/sec of throughput and 160,000 IOPS from the “drives” without caching. How the OpenZFS is provisioned to the AWS clients is well documented in this blog here. It is an absolutely joy (for me) to see the open source OpenZFS filesystem getting the validation and recognization from AWS. This is one hell of a filesystem.

But this blog isn’t about AWS FSx for OpenZFS with block storage. It is about what is coming, and eventually AWS FSx for OpenZFS could expand into AWS’s proficient S3 storage as well.  Can OpenZFS integrate with an S3 object storage backend? This blog looks into the burning question.

In the recently concluded OpenZFS Developer Summit 2021, one of the topics was “ZFS on Object Storage“, and the short answer is a resounding YES!

OpenZFS Developer Summit 2021

Continue reading

Open Source Storage Technology Crafters

The conversation often starts with a challenge. “What’s so great about open source storage technology?

For the casual end users of storage systems, regardless of SAN (definitely not Fibre Channel) or NAS on-premises, or getting “files” from the personal cloud storage like Dropbox, OneDrive et al., there is a strong presumption that open source storage technology is cheap and flaky. This is not helped with the diet of consumer brands of NAS in the market, where the price is cheap, but the storage offering with capabilities, reliability and performance are found to be wanting. Thus this notion floats its way to the business and enterprise users, and often ended up with a negative perception of open source storage technology.

Highway Signpost with Open Source wording

Storage Assemblers

Anybody can “build” a storage system with open source storage software. Put the software together with any commodity x86 server, and it can function with the basic storage services. Most open source storage software can do the job pretty well. However, once the completed storage technology is put together, can it do the job well enough to serve a business critical end user? I have plenty of sob stories from end users I have spoken to in these many years in the industry related to so-called “enterprise” storage vendors. I wrote a few blogs in the past that related to these sad situations:

We have such storage offerings rigged with cybersecurity risks and holes too. In a recent Unit 42 report, 250,000 NAS devices are vulnerable and exposed to the public Internet. The brands in question are mentioned in the report.

I would categorize these as storage assemblers.

Continue reading

Enterprise Storage is not just a Label

I have many anecdotes around the topic of Enterprise Storage, but the conversations in the past 2 weeks made it important for me to share this.

Enterprise Storage is …

Amusing, painful, angry

I get riled up whenever people do not want to be educated about Enterprise Storage. Here are a few that happened in the last 2 weeks.

[ Story #1 ]

A guy was building his own storage for cryptocurrency. He was informed by his supplier that the RAID card was enterprise, and he could get the best performance using “Enterprise” RAID-0.

  • Well, “Enterprise” RAID-0 volume crashed, and he lost all data. Painfully, he said he lost a hefty sum financially

[ Story #2 ]

A media company complained about the reliability of previous storage vendor. The GM was shopping around and was told that there are “Enterprise” SATA drives and the reliability is as good, if not better than SAS drives.

  • The company wanted a fully reliable Enterprise Storage system with 99.999% availability, and yet the SATA interface was not meant to build a more highly reliable enterprise storage. The GM insisted to use “Enterprise” SATA drives for his “enterprise” storage system instead of SAS.  

[ Story #3 ]

An IT admin of a manufacturing company claimed that they had an “Enterprise Storage” system for a few years, and could not figure out why his hard disk drives would die every 12-15 months.

  • He figured out that the drives supplied by his vendor were consumer SATA drives, even though he was told it was an “Enterprise Storage” system when he bought the system.

Continue reading

Layers in Storage – For better or worse

Storage arrays and storage services are built upon by layers and layers beneath its architecture. The physical components of hard disk drives and solid states are abstracted into RAID volumes, virtualized into other storage constructs before they are exposed as shares/exports, LUNs or objects to the network.

Everyone in the storage networking industry, is cognizant of the layers and it is the foundation of knowledge and experience. The public cloud storage services side is the same, albeit more opaque. Nevertheless, both have layers.

In the early 2000s, SNIA® Technical Council outlined a blueprint of the SNIA® Shared Storage Model, a framework describing layers and properties of a storage system and its services. It was similar to the OSI 7-layer model for networking. The framework helped many industry professionals and practitioners shaped their understanding and the development of knowledge in their respective fields. The layering scheme of the SNIA® Shared Storage Model is shown below:

SNIA Shared Storage Model – The layering scheme

Storage vendors layering scheme

While SNIA® storage layers were generic and open, each storage vendor had their own proprietary implementation of storage layers. Some of these architectures are simple, but some, I find a bit too complex and convoluted.

Here is an example of the layers of the Automated Volume Management (AVM) architecture of the EMC® Celerra®.

EMC Celerra AVM Layering Scheme

I would often scratch my head about AVM. Disks were grouped into RAID groups, which are LUNs (Logical Unit Numbers). Then they were defined as Celerra® dvols (disk volumes), and stripes of the dvols were consolidated into a storage pool.

From the pool, a piece of a storage capacity construct, called a slice volume, were combined with other slice volumes into a metavolume which eventually was presented as a file system to the network and their respective NAS clients. Explaining this took an effort because I was the IP Storage product manager for EMC® between 2007 – 2009. It was a far cry from the simplicity of NetApp® ONTAP 7 architecture of RAID groups and volumes, and the WAFL® (Write Anywhere File Layout) filesystem.

Another complicated layered framework I often gripe about is Ceph. Here is a look of how the layers of CephFS is constructed.

Ceph Storage Layered Framework

I work with the OpenZFS filesystem a lot. It is something I am rather familiar with, and the layered structure of the ZFS filesystem is essentially simpler.

Storage architecture mixology

Engineers are bizarre when they get too creative. They have a can do attitude that transcends the boundaries of practicality sometimes, and boggles many minds. This is what happens when they have their own mixology ideas.

Recently I spoke to two magnanimous persons who had the idea of providing Ceph iSCSI LUNs to the ZFS filesystem in order to use the simplicity of NAS file sharing capabilities in TrueNAS® CORE. From their own words, Ceph NAS capabilities sucked. I had to draw their whole idea out in a Powerpoint and this is the architecture I got from the conversation.

There are 3 different storage subsystems here just to provide NAS. As if Ceph layers aren’t complicated enough, the iSCSI LUNs from Ceph are presented as Cinder volumes to the KVM hypervisor (or VMware® ESXi) through the Cinder driver. Cinder is the persistent storage volume subsystem of the Openstack® project. The Cinder volumes/hypervisor datastore are virtualized as vdisks to the respective VMs installed with TrueNAS® CORE and OpenZFS filesystem. From the TrueNAS® CORE, shares and exports are provisioned via the SMB and NFS protocols to Windows and Linux respectively.

It works! As I was told, it worked!

A.P.P.A.R.M.S.C. considerations

Continuing from the layered framework described above for NAS, other aspects beside the technical work have to be considered, even when it can work technically.

I often use a set of diligent data storage focal points when considering a good storage design and implementation. This is the A.P.P.A.R.M.S.C. Take for instance Protection as one of the points and snapshot is the technology to use.

Snapshots can be executed at the ZFS level on the TrueNAS® CORE subsystem. Snapshots can be trigged at the volume level in Openstack® subsystem and likewise, rbd snapshots at the Ceph subsystem. The question is, which snapshot at which storage subsystem is the most valuable to the operations and business? Do you run all 3 snapshots? How do you execute them in succession in a scheduled policy?

In terms of performance, can it truly maximize its potential? Can it churn out the best IOPS, and deliver at wire speed? What is the latency we can expect with so many layers from 3 different storage subsystems?

And supporting this said architecture would be a nightmare. Where do you even start the troubleshooting?

Those are just a few considerations and questions to think about when such a layered storage architecture along. IMHO, such a design was over-engineered. I was tempted to say “Just because you can, doesn’t mean you should

Elegance in Simplicity

Einstein (I think) quoted:

Einstein’s quote on simplicity and complexity

I am not saying that having too many layers is wrong. Having a heavily layered architecture works for many storage solutions out there, where they are often masked with a simple and intuitive UI. But in yours truly point of view, as a storage architecture enthusiast and connoisseur, there is beauty and elegance in simple designs.

The purpose here is to promote better understanding of the storage layers, and how they integrate and interact with each other to deliver the data services to the network. In the end, that is how most storage architectures are built.