Object Storage becoming storage lingua franca of Edge-Core-Cloud

Data Fabric was a big buzzword going back several years. I wrote a piece talking about Data Fabric, mostly NetApp®’s,  almost 7 years ago, which I titled “The Transcendence of Data Fabric“. Regardless of storage brands and technology platforms, and each has its own version and interpretations, one thing holds true. There must be a one layer of Data Singularity. But this is easier said than done.

Fast forward to present. The latest buzzword is Edge-to-Core-Cloud or Cloud-to-Core-Edge. The proliferation of Cloud Computing services, has spawned beyond to multiclouds, superclouds and of course, to Edge Computing. Data is reaching to so many premises everywhere, and like water, data has found its way.

Edge-to-Core-to-Cloud (Gratitude thanks to https://www.techtalkthai.com/dell-technologies-opens-iot-solutions-division-and-introduces-distributed-core-architecture/)

The question on my mind is can we have a single storage platform to serve the Edge-to-Core-to-Cloud paradigm? Is there a storage technology which can be the seamless singularity of data? 7+ years onwards since my Data Fabric blog, The answer is obvious. Object Storage.

The ubiquitous object storage and the S3 access protocol

For a storage technology that was initially labeled “cheap and deep”, object storage has become immensely popular with developers, cloud storage providers and is fast becoming storage repositories for data connectors. I wrote a piece called “All the Sources and Sinks going to Object Storage” over a month back, which aptly articulate how far this technology has come.

But unknown to many (Google NASD and little is found), object storage started its presence in SNIA (it was developed in Carnegie-Mellon University prior to that) in the early 90s, then known as NASD (network attached secure disk). As it is made its way into the ANSI T10 INCITS standards development, it became known as Object-based Storage Device or OSD.

The introduction of object storage services 16+ years ago by Amazon Web Services (AWS) via their Simple Storage Services (S3) further strengthened the march of object storage, solidified its status as a top tier storage platform. It was to AWS’ genius to put the REST API over HTTP/HTTPS with its game changing approach to use CRUD (create, retrieve, update, delete) operations to work with object storage. Hence the S3 protocol, which has become the de facto access protocol to object storage.

Yes, I wrote those 2 blogs 11 and 9 years ago respectively because I saw that object storage technology was a natural fit to the burgeoning new world of storage computing. It has since come true many times over.

Continue reading

Building Trust in the Storage Brand

Trust is everything. When done right, the brand is trust.

One Wikibon article last month “Does Hardware (still) Matter?” touched on my sentiments and hit close to the heart. As the world becomes more and more data driven and cloud-centric, the prominence of IT infrastructure has diminished from the purview of the boardroom. The importance of IT infrastructure cannot be discounted but in this new age, storage infrastructure has become invisible.

In the seas of both on-premises and hybrid storage technology solutions, everyone is trying to stand out, trying to eke the minutest ounces of differentiation and advantage to gain the customer’s micro-attention. With all the drum beatings, the loyalty of the customer can switch in an instance unless we build trust.

I ponder a few storage industry variables that help build trust.

Open source Communities and tribes

During the hey-days of proprietary software and OSes, protectionism was key to guarding the differentiations and the advantages. Licenses were common, and some were paired with the hardware hostid to create that “power combination”. And who can forget those serial dongles license keys? Urgh!!

Since the open source movement (Read The Cathedral and the Bazaar publication) began, the IT world has begun to trust software and OSes more and more. Open Source communities grew and technology tribes were formed in all types of niches, including storage software. Trust grew because the population of the communities kept the vendors honest. Gone are the days of the Evil Empire. Even Microsoft® became a ‘cool kid’.

TRUST

One open source storage filesystem I worked extensively on is OpenZFS. From its beginnings after Open Solaris® (remember build 134), becoming part of the Illumos project and then later in FreeBSD® and Linux upstream. Trust in OpenZFS was developed over time because of the open source model. It has spawned many storage projects including FreeNAS™ which later became TrueNAS®.

Continue reading

What happened to NDMP?

The acronym NDMP shows up once in a while in NAS (Network Attached Storage) upgrade tenders. And for the less informed, NDMP (Network Data Management Protocol) was one of the early NAS data management (more like data mover specifications) initiatives to backup NAS devices, especially the NAS appliances that run proprietary operating systems code.

NDMP Logo

Backup software vendors often have agents developed specifically for an operating system or an operating environment. But back in the mid-1990s, 2000s, the internal file structures of these proprietary vendors were less exposed, making it harder for backup vendors to develop agents for them. Furthermore, there was a need to simplify the data movements of NAS files between backup servers and the NAS as a client, to the media servers and eventually to the tape or disk targets. The dominant network at the time ran at 100Mbits/sec.

To overcome this, Network Appliance® and PDC Solutions/Legato® developed the NDMP protocol, allowing proprietary NAS devices to run a standardized client-server architecture with the NDMP server daemon in the NAS and the backup service running as an NDMP client. Here is a simplified look at the NDMP architecture.

NDMP Client-Server Architecture

Continue reading

Where are your files living now?

[ This is Part One of a longer conversation ]

EMC2 (before the Dell® acquisition) in the 2000s had a tagline called “Where Information Lives™**. This was before the time of cloud storage. The tagline was an adage of enterprise data storage, proper and contemporaneous to the persistent narrative at the time – Data Consolidation. Within the data consolidation stories, thousands of files and folders moved about the networks of the organizations, from servers to clients, clients to servers. NAS (Network Attached Storage) was, and still is the work horse of many, many organizations.

[ **Side story ] There was an internal anti-EMC joke within NetApp® called “Information has a new address”.

EMC tagline “Where Information Lives”

This was a time where there were almost no concerns about Shadow IT; ransomware were less known; and most importantly, almost everyone knew where their files and folders were, more or less (except in Oil & Gas upstream – to be told in later in this blog). That was because there were concerted attempts to consolidate data, and inadvertently files and folders, in the organization.

Even when these organizations were spread across the world, there were distributed file technologies at the time that could deliver files and folders in an acceptable manner. Definitely not as good as what we have today in a cloudy world, but acceptable. I personally worked a project setting up Andrew File Systems for Intel® in Penang in the mid-90s, almost joined Tacit Networks in the mid-2000s, dabbled on Microsoft® Distributed File System with NetApp® and Windows File Servers while fixing the mountains of issues in deploying the worldwide GUSto (Global Unified Storage) Project in Shell 2006. Somewhere in my chronological listings, Acopia Networks (acquired by F5) and of course, EMC2 Rainfinity and NetApp® NuView OEM, Virtual File Manager.

The point I am trying to make here is most IT organizations had a good grip of where the files and folders were. I do not think this is very true anymore. Do you know where your files and folders are living today? 

Continue reading

Layers in Storage – For better or worse

Storage arrays and storage services are built upon by layers and layers beneath its architecture. The physical components of hard disk drives and solid states are abstracted into RAID volumes, virtualized into other storage constructs before they are exposed as shares/exports, LUNs or objects to the network.

Everyone in the storage networking industry, is cognizant of the layers and it is the foundation of knowledge and experience. The public cloud storage services side is the same, albeit more opaque. Nevertheless, both have layers.

In the early 2000s, SNIA® Technical Council outlined a blueprint of the SNIA® Shared Storage Model, a framework describing layers and properties of a storage system and its services. It was similar to the OSI 7-layer model for networking. The framework helped many industry professionals and practitioners shaped their understanding and the development of knowledge in their respective fields. The layering scheme of the SNIA® Shared Storage Model is shown below:

SNIA Shared Storage Model – The layering scheme

Storage vendors layering scheme

While SNIA® storage layers were generic and open, each storage vendor had their own proprietary implementation of storage layers. Some of these architectures are simple, but some, I find a bit too complex and convoluted.

Here is an example of the layers of the Automated Volume Management (AVM) architecture of the EMC® Celerra®.

EMC Celerra AVM Layering Scheme

I would often scratch my head about AVM. Disks were grouped into RAID groups, which are LUNs (Logical Unit Numbers). Then they were defined as Celerra® dvols (disk volumes), and stripes of the dvols were consolidated into a storage pool.

From the pool, a piece of a storage capacity construct, called a slice volume, were combined with other slice volumes into a metavolume which eventually was presented as a file system to the network and their respective NAS clients. Explaining this took an effort because I was the IP Storage product manager for EMC® between 2007 – 2009. It was a far cry from the simplicity of NetApp® ONTAP 7 architecture of RAID groups and volumes, and the WAFL® (Write Anywhere File Layout) filesystem.

Another complicated layered framework I often gripe about is Ceph. Here is a look of how the layers of CephFS is constructed.

Ceph Storage Layered Framework

I work with the OpenZFS filesystem a lot. It is something I am rather familiar with, and the layered structure of the ZFS filesystem is essentially simpler.

Storage architecture mixology

Engineers are bizarre when they get too creative. They have a can do attitude that transcends the boundaries of practicality sometimes, and boggles many minds. This is what happens when they have their own mixology ideas.

Recently I spoke to two magnanimous persons who had the idea of providing Ceph iSCSI LUNs to the ZFS filesystem in order to use the simplicity of NAS file sharing capabilities in TrueNAS® CORE. From their own words, Ceph NAS capabilities sucked. I had to draw their whole idea out in a Powerpoint and this is the architecture I got from the conversation.

There are 3 different storage subsystems here just to provide NAS. As if Ceph layers aren’t complicated enough, the iSCSI LUNs from Ceph are presented as Cinder volumes to the KVM hypervisor (or VMware® ESXi) through the Cinder driver. Cinder is the persistent storage volume subsystem of the Openstack® project. The Cinder volumes/hypervisor datastore are virtualized as vdisks to the respective VMs installed with TrueNAS® CORE and OpenZFS filesystem. From the TrueNAS® CORE, shares and exports are provisioned via the SMB and NFS protocols to Windows and Linux respectively.

It works! As I was told, it worked!

A.P.P.A.R.M.S.C. considerations

Continuing from the layered framework described above for NAS, other aspects beside the technical work have to be considered, even when it can work technically.

I often use a set of diligent data storage focal points when considering a good storage design and implementation. This is the A.P.P.A.R.M.S.C. Take for instance Protection as one of the points and snapshot is the technology to use.

Snapshots can be executed at the ZFS level on the TrueNAS® CORE subsystem. Snapshots can be trigged at the volume level in Openstack® subsystem and likewise, rbd snapshots at the Ceph subsystem. The question is, which snapshot at which storage subsystem is the most valuable to the operations and business? Do you run all 3 snapshots? How do you execute them in succession in a scheduled policy?

In terms of performance, can it truly maximize its potential? Can it churn out the best IOPS, and deliver at wire speed? What is the latency we can expect with so many layers from 3 different storage subsystems?

And supporting this said architecture would be a nightmare. Where do you even start the troubleshooting?

Those are just a few considerations and questions to think about when such a layered storage architecture along. IMHO, such a design was over-engineered. I was tempted to say “Just because you can, doesn’t mean you should

Elegance in Simplicity

Einstein (I think) quoted:

Einstein’s quote on simplicity and complexity

I am not saying that having too many layers is wrong. Having a heavily layered architecture works for many storage solutions out there, where they are often masked with a simple and intuitive UI. But in yours truly point of view, as a storage architecture enthusiast and connoisseur, there is beauty and elegance in simple designs.

The purpose here is to promote better understanding of the storage layers, and how they integrate and interact with each other to deliver the data services to the network. In the end, that is how most storage architectures are built.

 

Do we still need FAST (and its cohorts)?

In a recent conversation with an iXsystems™ reseller in Hong Kong, the topic of Storage Tiering was brought up. We went about our banter and I brought up the inter-array tiering and the intra-array tiering piece.

After that conversation, I started thinking a lot about intra-array tiering, where data blocks within the storage array were moved between fast and slow storage media. The general policy was simple. Find all the least frequently access blocks and move them from a fast tier like the SSD tier, to a slower tier like the spinning drives with different RPM speeds. And then promote the data blocks to the faster media when accessed frequently. Of course, there were other variables in the mix besides storage media and speeds.

My mind raced back 10 years or more to my first encounter with Compellent and 3PAR. Both were still independent companies then, and I had my first taste of intra-array tiering

The original Compellent and 3PAR logos

I couldn’t recall which encounter I had first, but I remembered the time of both events were close. I was at Impact Business Solutions in their office listening to their Compellent pitch. The Kuching boys (thank you Chyr and Winston!) were very passionate in evangelizing the Compellent Data Progression technology.

At about the same time, I was invited by PTC Singapore GM at the time, Ken Chua to grace their new Malaysian office and listen to their latest storage vendor partnership, 3PAR. I have known Ken through my NetApp® days, and he linked me up Nathan Boeger, 3PAR’s pre-sales consultant. 3PAR had their Adaptive Optimization (AO) disk tiering and Dynamic Optimization (DO) technology.

Continue reading

Dell EMC Isilon is an Emmy winner!

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at this event. The content of this blog is of my own opinions and views ]

And the Emmy® goes to …

Yes, the Emmy® goes to Dell EMC Isilon! It was indeed a well deserved accolade and an honour!

Dell EMC Isilon had just won the Technology & Engineering Emmy® Awards a week before Storage Field Day 19, for their outstanding pioneering work on the NAS platform tiering technology of media and broadcasting content according to business value.

A lasting true clustered NAS

This is not a blog to praise Isilon but one that instill respect to a real true clustered, scale-out file system. I have known of OneFS for a long time, but never really took the opportunity to really put my hands on it since 2006 (there is a story). So here is a look at history …

Back in early to mid-2000, there was a lot of talks about large scale NAS. There were several players in the nascent scaling NAS market. NetApp was the filer king, with several competitors such as Polyserve, Ibrix, Spinnaker, Panasas and the young upstart Isilon. There were also Procom, BlueArc and NetApp’s predecessor Auspex. By the second half of the 2000 decade, the market consolidated and most of these NAS players were acquired.

Continue reading

DellEMC Project Nautilus Re-imagine Storage for Streams

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at this event. The content of this blog is of my own opinions and views ]

Cloud computing will have challenges processing data at the outer reach of its tentacles. Edge Computing, as it melds with the Internet of Things (IoT), needs a different approach to data processing and data storage. Data generated at source has to be processed at source, to respond to the event or events which have happened. Cloud Computing, even with 5G networks, has latency that is not sufficient to how an autonomous vehicle react to pedestrians on the road at speed or how a sprinkler system is activated in a fire, or even a fraud detection system to signal money laundering activities as they occur.

Furthermore, not all sensors, devices, and IoT end-points are connected to the cloud at all times. To understand this new way of data processing and data storage, have a look at this video by Jay Kreps, CEO of Confluent for Kafka® to view this new perspective.

Data is continuously and infinitely generated at source, and this data has to be compiled, controlled and consolidated with nanosecond precision. At Storage Field Day 19, an interesting open source project, Pravega, was introduced to the delegates by DellEMC. Pravega is an open source storage framework for streaming data and is part of Project Nautilus.

Rise of  streaming time series Data

Processing data at source has a lot of advantages and this has popularized Time Series analytics. Many time series and streams-based databases such as InfluxDB, TimescaleDB, OpenTSDB have sprouted over the years, along with open source projects such as Apache Kafka®, Apache Flink and Apache Druid.

The data generated at source (end-points, sensors, devices) is serialized, timestamped (as event occurs), continuous and infinite. These are the properties of a time series data stream, and to make sense of the streaming data, new data formats such as Avro, Parquet, Orc pepper the landscape along with the more mature JSON and XML, each with its own strengths and weaknesses.

You can learn more about these data formats in the 2 links below:

DIY is difficult

Many time series projects started as DIY projects in many organizations. And many of them are still DIY projects in production systems as well. They depend on tribal knowledge, and these databases are tied to an unmanaged storage which is not congruent to the properties of streaming data.

At the storage end, the technologies today still rely on the SAN and NAS protocols, and in recent years, S3, with object storage. Block, file and object storage introduce layers of abstraction which may not be a good fit for streaming data.

Continue reading

Komprise is a Winner

[Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies to be presented at this event. The content of this blog is of my own opinions and views]

I, for one perhaps have seen far too many “file lifecycle and data management” software solutions that involved tiering, hierarchical storage management, ILM or whatever you call them these days. If I do a count, I would have managed or implemented at least 5 to 6 products, including a home grown one.

The whole thing is a very crowded market and I have seen many which have come and gone, and so when the opportunity to have a session with Komprise came at Storage Field Day 19, I did not carry a lot of enthusiasm.

Continue reading