Layers in Storage – For better or worse

Storage arrays and storage services are built upon by layers and layers beneath its architecture. The physical components of hard disk drives and solid states are abstracted into RAID volumes, virtualized into other storage constructs before they are exposed as shares/exports, LUNs or objects to the network.

Everyone in the storage networking industry, is cognizant of the layers and it is the foundation of knowledge and experience. The public cloud storage services side is the same, albeit more opaque. Nevertheless, both have layers.

In the early 2000s, SNIA® Technical Council outlined a blueprint of the SNIA® Shared Storage Model, a framework describing layers and properties of a storage system and its services. It was similar to the OSI 7-layer model for networking. The framework helped many industry professionals and practitioners shaped their understanding and the development of knowledge in their respective fields. The layering scheme of the SNIA® Shared Storage Model is shown below:

SNIA Shared Storage Model – The layering scheme

Storage vendors layering scheme

While SNIA® storage layers were generic and open, each storage vendor had their own proprietary implementation of storage layers. Some of these architectures are simple, but some, I find a bit too complex and convoluted.

Here is an example of the layers of the Automated Volume Management (AVM) architecture of the EMC® Celerra®.

EMC Celerra AVM Layering Scheme

I would often scratch my head about AVM. Disks were grouped into RAID groups, which are LUNs (Logical Unit Numbers). Then they were defined as Celerra® dvols (disk volumes), and stripes of the dvols were consolidated into a storage pool.

From the pool, a piece of a storage capacity construct, called a slice volume, were combined with other slice volumes into a metavolume which eventually was presented as a file system to the network and their respective NAS clients. Explaining this took an effort because I was the IP Storage product manager for EMC® between 2007 – 2009. It was a far cry from the simplicity of NetApp® ONTAP 7 architecture of RAID groups and volumes, and the WAFL® (Write Anywhere File Layout) filesystem.

Another complicated layered framework I often gripe about is Ceph. Here is a look of how the layers of CephFS is constructed.

Ceph Storage Layered Framework

I work with the OpenZFS filesystem a lot. It is something I am rather familiar with, and the layered structure of the ZFS filesystem is essentially simpler.

Storage architecture mixology

Engineers are bizarre when they get too creative. They have a can do attitude that transcends the boundaries of practicality sometimes, and boggles many minds. This is what happens when they have their own mixology ideas.

Recently I spoke to two magnanimous persons who had the idea of providing Ceph iSCSI LUNs to the ZFS filesystem in order to use the simplicity of NAS file sharing capabilities in TrueNAS® CORE. From their own words, Ceph NAS capabilities sucked. I had to draw their whole idea out in a Powerpoint and this is the architecture I got from the conversation.

There are 3 different storage subsystems here just to provide NAS. As if Ceph layers aren’t complicated enough, the iSCSI LUNs from Ceph are presented as Cinder volumes to the KVM hypervisor (or VMware® ESXi) through the Cinder driver. Cinder is the persistent storage volume subsystem of the Openstack® project. The Cinder volumes/hypervisor datastore are virtualized as vdisks to the respective VMs installed with TrueNAS® CORE and OpenZFS filesystem. From the TrueNAS® CORE, shares and exports are provisioned via the SMB and NFS protocols to Windows and Linux respectively.

It works! As I was told, it worked!

A.P.P.A.R.M.S.C. considerations

Continuing from the layered framework described above for NAS, other aspects beside the technical work have to be considered, even when it can work technically.

I often use a set of diligent data storage focal points when considering a good storage design and implementation. This is the A.P.P.A.R.M.S.C. Take for instance Protection as one of the points and snapshot is the technology to use.

Snapshots can be executed at the ZFS level on the TrueNAS® CORE subsystem. Snapshots can be trigged at the volume level in Openstack® subsystem and likewise, rbd snapshots at the Ceph subsystem. The question is, which snapshot at which storage subsystem is the most valuable to the operations and business? Do you run all 3 snapshots? How do you execute them in succession in a scheduled policy?

In terms of performance, can it truly maximize its potential? Can it churn out the best IOPS, and deliver at wire speed? What is the latency we can expect with so many layers from 3 different storage subsystems?

And supporting this said architecture would be a nightmare. Where do you even start the troubleshooting?

Those are just a few considerations and questions to think about when such a layered storage architecture along. IMHO, such a design was over-engineered. I was tempted to say “Just because you can, doesn’t mean you should

Elegance in Simplicity

Einstein (I think) quoted:

Einstein’s quote on simplicity and complexity

I am not saying that having too many layers is wrong. Having a heavily layered architecture works for many storage solutions out there, where they are often masked with a simple and intuitive UI. But in yours truly point of view, as a storage architecture enthusiast and connoisseur, there is beauty and elegance in simple designs.

The purpose here is to promote better understanding of the storage layers, and how they integrate and interact with each other to deliver the data services to the network. In the end, that is how most storage architectures are built.

 

Storageless shan’t be thy name

Storageless??? What kind of a tech jargon is that???

This latest jargon irked me. Storage vendor NetApp® (through its acquisition of Spot) and Hammerspace, a metadata-driven storage agnostic orchestration technology company, have begun touting the “storageless” tech jargon in hope that it will become an industry buzzword. Once again, the hype cycle jargon junkies are hard at work.

Clear, empty storage containers

Clear, nondescript storage containers

It is obvious that the storageless jargon wants to ride on the hype of serverless computing, an abstraction method of computing resources where the allocation and the consumption of resources are defined by pieces of programmatic code of the running application. The “calling” of the underlying resources are based on the application’s code, and thus, rendering the computing resources invisible, insignificant and not sexy.

My stand

Among the 3 main infrastructure technology – compute, network, storage, storage technology is a bit of a science and a bit of dark magic. It is complex and that is what makes storage technology so beautiful. The constant innovation and technology advancement continue to make storage as a data services platform relentlessly interesting.

Cloud, Kubernetes and many data-as-a-service platforms require strong persistent storage. As defined by NIST Definition of Cloud Computing, the 4 of the 5 tenets – on-demand self-service, resource pooling, rapid elasticity, measured servicedemand storage to be abstracted. Therefore, I am all for abstraction of storage resources from the data services platform.

But the storageless jargon is doing a great disservice. It is not helping. It does not lend its weight glorifying the innovations of storage. In fact, IMHO, it felt like a weighted anchor sinking storage into the deepest depth, invisible, insignificant and not sexy. I am here dutifully to promote and evangelize storage innovations, and I am duly unimpressed with such a jargon.

Continue reading

A Paean to NFS

It is certainly encouraging to see both NAS protocols, NFS and SMB, featured well in the latest VMware® vSAN 7 Update 1 release. The NFS v3 and v4.1 support was already in vSAN 7.0 when it was earlier announced as part of its Native File Services for vSAN. But some years ago, NFS was not always the primary storage protocol of choice. SAN protocols, Fibre Channel and iSCSI, were almost always designated to serve enterprise applications. At the client side, Windows became prominent, and the SMB/CIFS protocol dominated the landscape of the desktop. This further pushed NFS into the back closet.

NFS or Network File System has its naysayers. The venerable, but often maligned distributed network file protocol is 36 years today. In storage vendors such as NetApp®, VAST Data, Pure Storage FlashBlade, and Dell EMC Isilon, NFS is still positioned as the primary file protocol for manufacturing testers on the shop floor, EDA/eCAD applications, seismic and subsurface applications in Oil & Gas and many more. In another development, just like its presence in the vSAN Native Services,, NFS has also quietly embedded itself into many storage platforms to serve the data platform services within the respective framework itself.

And I have experienced NFS from the client side to the enterprise applications and more, and I take this opportunity to pay tribute.

NFS (Network File System) client server network

NFS (Network File System) client server network

Continue reading

Storage in a shiny multi-cloud space

The multi-cloud for infrastructure-as-a-service (IaaS) era is not here (yet). That is what the technology marketers want you to think. The hype, the vapourware, the frenzy. It is what they do. The same goes to technology analysts where they describe vision and futures, and the high level constructs and strategies to get there. The hype of multi-cloud is often thought of running applications and infrastructure services seamlessly in several public clouds such as Amazon AWS, Microsoft® Azure and Google Cloud Platform, and linking it to on-premises data centers and private clouds. Hybrid is the new black.

Multicloud connectivity to public cloud providers and on-premises private cloud

Multi-Cloud, on-premises, public and hybrid clouds

And the aspiration of multi-cloud is the right one, when it is truly ready. Gartner® wrote a high level article titled “Why Organizations Choose a Multicloud Strategy“. To take advantage of each individual cloud’s strengths and resiliency in respective geographies make good business sense, but there are many other considerations that cannot be an afterthought. In this blog, we look at a few of them from a data storage perspective.

In the beginning there was … 

For this storage dinosaur, data storage and compute have always coupled as one. In the mainframe DASD days. these 2 were together. Even with the rise of networking architectures and protocols, from IBM SNA, DECnet, Ethernet & TCP/IP, and Token Ring FC-SAN (sorry, this is just a joke), the SANs, the filers to the servers were close together, albeit with a network buffered layer.

A decade ago, when the public clouds started appearing, data storage and compute were mostly inseparable. There was demarcation of public clouds and private clouds. The notion of hybrid clouds meant public clouds and private clouds can intermix with on-premise computing and data storage but in almost all cases, this was confined to a single public cloud provider. Until these public cloud providers realized they were not able to entice the larger enterprises to move their IT out of their on-premises data centers to the cloud convincingly. So, these public cloud providers decided to reverse their strategy and peddled their cloud services back to on-prem. Today, Amazon AWS has Outposts; Microsoft® Azure has Arc; and Google Cloud Platform launched Anthos.

Continue reading

Valuing the security value of NAS storage

Garmin paid, reportedly millions. Do you sleep well at night knowing that the scourge of ransomware is rampant and ever threatening your business. Is your storage safe enough or have you invested in a storage which was the economical (also to be known as cheap) to your pocket?

Garmin was hacked by ransomware

I have highlighted this before. NAS (Network Attached Storage) has become the goldmine for ransomware. And in the mire of this COVID-19 pandemic, the lackadaisical attitude of securing the NAS storage remains. Too often than not, end users and customers, especially in the small medium enterprises segment, continue to search for the most economical NAS storage to use in their business.

Is price the only factor?

Why do customers and end users like to look at the price? Is an economical capital outlay of a cheap NAS storage with 3-year hardware and shallow technical support that significant to appease the pocket gods? Some end users might decided to rent cloud file storage, Hotel California style until they counted the 3-year “rental” price.

Continue reading

Persistent Storage could stifle Google Anthos multi-cloud ambitions

To win in the multi-cloud game, you have to be in your competitors’ cloud. Google Cloud has been doing that since they announced Google Anthos just over a year ago. They have been crafting their “assault”, starting with on-premises, and Anthos on AWS. Anthos on Microsoft® Azure is coming, currently in preview mode.

Google CEO Sundar Pichai announcing Google Anthos at Next ’19

BigQuery Omni conversation starter

2 weeks ago, whilst the Google Cloud BigQuery Omni announcement was still under wraps, local Malaysian IT portal Enterprise IT News sent me the embargoed article to seek my views and opinions. I have to admit that I was ignorant about the deeper workings of BigQuery, and haven’t fully gone through the works of Google Anthos as well. So I researched them.

Having done some small works on Qubida (defunct) and Talend several years ago, I have grasped useful data analytics and data enablement concepts, and so BigQuery fitted into my understanding of BigQuery Omni quite well. That triggered my interests to write this blog and meshing the persistent storage conundrum (at least for me it is something to be untangled) to Kubernetes, to GKE (Google Kubernetes Engine), and thus Anthos as well.

For discussion sake, here is an overview of BigQuery Omni.

An overview of Google Cloud BigQuery Omni on multiple cloud providers

My comments and views are in this EITN article “Google Cloud’s BigQuery Omni for Multi-cloud Analytics”.

Continue reading

Resilient Integrated Data Protection against Ransomware

Early in the year, I wrote about NAS systems being a high impact target for ransomware. I called NAS a goldmine for ransomware. This is still very true because NAS systems are the workhorses of many organizations. They serve files and folders and from it, the sharing and collaboration of Work.

Another common function for NAS systems is being a target for backups. In small medium organizations, backup software often direct their backups to a network drive in the network. Even for larger enterprise customers too, NAS is the common destination for backups.

Backup to NAS system

Typical NAS backup for small medium organizations.

Backup to Data Domain with NAS Protocols

Backup to Data Domain with NAS (NFS, CIFS) Protocols

Ransomware is obviously targeting the backup as another high impact target, with the potential to disrupt the rescue and the restoration of the work files and folders.

Continue reading

Down the rabbit hole with Kubernetes Storage

Kubernetes is on fire. Last week VMware® released the State of Kubernetes 2020 report which surveyed companies with 1,000 employees and above. Results were not surprising as the adoptions of this nascent technology are booming. But persistent storage remained the nagging concern for the Kubernetes serving the infrastructure resources to applications instances running in the containers of a pod in a cluster.

The standardization of storage resources have settled with CSI (Container Storage Interface). Storage vendors have almost, kind of, sort of agreed that the API objects such as PersistentVolumes, PersistentVolumeClaims, StorageClasses, along with the parameters would be the way to request the storage resources from the Pre-provisioned Volumes via the CSI driver plug-in. There are already more than 50 vendor specific CSI drivers in Github.

Kubernetes and CSI initiative

Kubernetes and the CSI (Container Storage Interface) logos

The CSI plug-in method is the only way for Kubernetes to scale and keep its dynamic, loadable storage resource integration with external 3rd party vendors, all clamouring to grab a piece of this burgeoning demands both in the cloud and in the enterprise.

Continue reading

A Dialogue between 2 Drives

I was talking to an end user who was slowly getting exposed to the cloud amid this Covid-19 pandemic. The whole work from home thingy was not new to him, but the scale of the practice suddenly escalated when more than 80 of his staff have to work from wherever they were stuck at during the past 6 weeks. Initially all of his staff had to alternate their folders and files access because their Sonicwall® Global Client license and SSL VPN Clients were inadequate. Even after their upgrade of the licenses, the performance of getting the folders and files through the Z: drive was poor and the network was chocked up. I told them that regardless, the SMB protocol of the NAS shared folders was chatty and generated a lot of network traffic on the VPN, along with the inadequacies of running this over the wide area Internet network. Staff productivity obviously nosedived.

We are now exploring putting their work in the cloud but maintaining a consistent synchronized set of folders and files at all times. Wasabi® Cloud has emerged the most attractive price/GB/month and no egress or API requests fees.

Combining 2 shared drives into one

NAS Drive talking to Cloud Drive like 2 buddies

Now here is a story of 2 Drives

The end user is not an IT savvy user. They were unfamiliar with Cloud Storage other than the free personal ones like Google Drive, or Dropbox. They have more than 200TB and I have introduced to them Wasabi® Cloud. They were very familiar with their Z:, their NAS Drive. I introduced to them the Cloud Drive.

NAS: Hey, how’s it going?

Cloud: Not bad. My boss and your boss are talking about bringing me and Wasabi® Cloud to join your gang. Hope you are OK with that.

Continue reading

Falconstor Software Defined Data Preservation for the Next Generation

Falconstor® Software is gaining momentum. Given its arduous climb back to the fore, it is beginning to soar again.

Tape technology and Digital Data Preservation

I mentioned that long term digital data preservation is a segment within the data lifecycle which has merits and prominence. SNIA® has proved that this is a strong growing market segment through its 2007 and 2017 “100 Year Archive” surveys, respectively. 3 critical challenges of this long, long-term digital data preservation is to keep the archives

  • Accessible
  • Undamaged
  • Usable

For the longest time, tape technology has been the king of the hill for digital data preservation. The technology is cheap, mature, and many enterprises has built their long term strategy around it. And the pulse in the tape technology market is still very healthy.

The challenges of tape remain. Every 5 years or so, companies have to consider moving the data on the existing tape technology to the next generation. It is widely known that LTO can read tapes of the previous 2 generations, and write to it a generation before. The tape transcription process of migrating digital data for the sake of data preservation is bad because it affects the structural integrity and quality of the content of the data.

In my times covering the Oil & Gas subsurface data management, I have seen NOCs (national oil companies) with 500,000 tapes of all generations, from 1/2″ to DDS, DAT to SDLT, 3590 to LTO 1-7. And millions are spent to transcribe these tapes every few years and we have folks like Katalyst DM, Troika and more hovering this landscape for their fill.

Continue reading