Object Storage becoming storage lingua franca of Edge-Core-Cloud

Data Fabric was a big buzzword going back several years. I wrote a piece talking about Data Fabric, mostly NetApp®’s,  almost 7 years ago, which I titled “The Transcendence of Data Fabric“. Regardless of storage brands and technology platforms, and each has its own version and interpretations, one thing holds true. There must be a one layer of Data Singularity. But this is easier said than done.

Fast forward to present. The latest buzzword is Edge-to-Core-Cloud or Cloud-to-Core-Edge. The proliferation of Cloud Computing services, has spawned beyond to multiclouds, superclouds and of course, to Edge Computing. Data is reaching to so many premises everywhere, and like water, data has found its way.

Edge-to-Core-to-Cloud (Gratitude thanks to https://www.techtalkthai.com/dell-technologies-opens-iot-solutions-division-and-introduces-distributed-core-architecture/)

The question on my mind is can we have a single storage platform to serve the Edge-to-Core-to-Cloud paradigm? Is there a storage technology which can be the seamless singularity of data? 7+ years onwards since my Data Fabric blog, The answer is obvious. Object Storage.

The ubiquitous object storage and the S3 access protocol

For a storage technology that was initially labeled “cheap and deep”, object storage has become immensely popular with developers, cloud storage providers and is fast becoming storage repositories for data connectors. I wrote a piece called “All the Sources and Sinks going to Object Storage” over a month back, which aptly articulate how far this technology has come.

But unknown to many (Google NASD and little is found), object storage started its presence in SNIA (it was developed in Carnegie-Mellon University prior to that) in the early 90s, then known as NASD (network attached secure disk). As it is made its way into the ANSI T10 INCITS standards development, it became known as Object-based Storage Device or OSD.

The introduction of object storage services 16+ years ago by Amazon Web Services (AWS) via their Simple Storage Services (S3) further strengthened the march of object storage, solidified its status as a top tier storage platform. It was to AWS’ genius to put the REST API over HTTP/HTTPS with its game changing approach to use CRUD (create, retrieve, update, delete) operations to work with object storage. Hence the S3 protocol, which has become the de facto access protocol to object storage.

Yes, I wrote those 2 blogs 11 and 9 years ago respectively because I saw that object storage technology was a natural fit to the burgeoning new world of storage computing. It has since come true many times over.

Continue reading

All the Sources and Sinks going to Object Storage

The vocabulary of sources and sinks are beginning to appear in the world of data storage as we witness the new addition of data processing frameworks and the applications in this space. I wrote about this in my blog “Rethinking data. processing frameworks systems in real time” a few months ago, introducing my take on this budding new set of I/O characteristics and data ecosystem. I also started learning about the Kappa Architecture (and Lambda as well), a framework designed to craft and develop a set of amalgamated technologies to handle stream processing of a series of data in relation to time.

In Computer Science, sources and sinks are considered external entities that often serve as connectors of input and output of disparate systems. They are often not in the purview of data storage architects. Also often, these sources and sinks are viewed as black boxes, and their inner workings are hidden from the views of the data storage architects.

Diagram from https://developer.here.com/documentation/get-started/dev_guide/shared_content/topics/olp/concepts/pipelines.html

The changing facade of data stream processing presents the constant motion of data, the continuous data being altered as it passes through the many integrated sources and sinks. We are also see much of the data processed in-memory as much as possible. Thus, the data services from a traditional storage model of SAN and NAS may straggle with the requirements demanded by this new generation of data stream processing.

As the world of traditional data storage processing is expanding into data streams processing and vice versa, and the chatter of sources and sinks can no longer be ignored.

Continue reading

Computational Storage embodies Data Velocity and Locality

I have been earnestly observing the growth of Computational Storage for a number of years now.  It was known by several previous names, with the name “in-situ data processing” stuck with me the most. The Computational Storage nomenclature became more cohesive when SNIA® put together the CMSI (Compute Memory Storage Initiative) some time back. This initiative is where several standards bodies, the major technology players and several SIGs (special interest groups) in SNIA® collaborated to advance Computational Storage segment in the storage technology industry we know of today.

The use cases for Computational Storage are burgeoning, and the functional implementations of Computational Storage are becoming vital to tackle the explosive data tsunami. In 2018 IDC, in its Worldwide Global Datasphere Forecast 2021-2025 report, predicted that the world will have 175 ZB (zettabytes) of data. That number, according to hearsay, has been revised to a heady figure of 250ZB, given the superlative rate data is being originated, spawned and more.

Computational Storage driving factors

If we take the Computer Science definition of in-situ processing, Computational Storage can be distilled as processing data where it resides. In a nutshell, “Bring Compute closer to Storage“. This means that there is a processing unit within the storage subsystem which does not require the host CPU to perform processing. In a very simplistic manner, a RAID card in a storage array can be considered a Computational Storage device because it performs the RAID functions instead of the host CPU. But this new generation of Computational Storage has much more prowess than just the RAID function in a RAID card.

There are many factors in Computational Storage that make a lot sense. Here are a few:

  1. Voluminous data inundate the centralized architecture of the cloud platforms and the enterprise systems today. Much of the data come from end point devices – mobile devices, sensors, IoT, point-of-sales, video cameras, et.al. Pre-processing the data at the origin data points can help filter the data, reduce the size to be processed centrally, and secure the data before they are ingested into the central data processing systems
  2. Real-time processing of the data at the moment the data is received gives the opportunity to create the Velocity of Data Analytics. Much of the data do not need to move to a central data processing system for analysis. Often in use cases like autonomous vehicles, fraud detection, recommendation systems, disaster alerts etc require near instantaneous responses. Performing early data analytics at the data origin point has tremendous advantages.
  3. Moore’s Law is waning. The CPU (central processing unit) is no longer the center of the universe. We are beginning to see CPU offloading technologies to augment the CPU’s duties such as compression, encryption, transcoding and more. SmartNICs, DPUs (data processing units), VPUs (visual processing units), GPUs (graphics processing units), etc have come forth to formulate a new computing paradigm.
  4. Freeing up central resources with Computational Storage also accelerates the overall distributed data processing in the whole data architecture. The CPU and the adjoining memory subsystem are less required to perform context switching caused by I/O interrupts as in most of the compute/storage architecture today. The total effect relieves the CPU and giving back more CPU cycles to perform higher processing tasks, resulting in faster performance overall.
  5. The rise of memory interconnects is enabling a more distributed computing fabric of data processing subsystems. The rising CXL (Compute Express Link™) interconnect protocol, especially after the Gen-Z annex, has emerged a force to be reckoned with. This rise of memory interconnects will likely strengthen the testimony of Computational Storage in the fast approaching future.

Computational Storage Deployment Models

SNIA Computational Storage Universe in 2019

Continue reading

Time to Conflate Storage with Data Services

Around the year 2016, I started to put together a better structure to explain storage infrastructure. I started using the word Data Services Platform before what it is today. And I formed a pictorial scaffold to depict what I wanted to share. This was what I made at that time.

Data Services Platform (circa 2016)- Copyright Heoh Chin Fah

One of the reasons I am bringing this up again is many of the end users and resellers still look at storage from the perspective of capacity, performance and price. And as if two plus two equals five, many storage pre-sales and architects reciprocate with the same type of responses that led to the deteriorated views of the storage technology infrastructure industry as a whole. This situation irks me. A lot.

Continue reading

Celebrating MinIO

Essentially MinIO is a web server …

I vaguely recalled Anand Babu Periasamy (AB as he is known), the CEO of MinIO saying that when I first met him in 2017. I was fresh “playing around” with MinIO and instantly I fell in love with software technology. Wait a minute. Object storage wasn’t supposed to be so easy. It was not supposed to be that simple to set up and use, but MinIO burst into my storage universe like the birth of the Infinity Stones. There was a eureka moment. And I was attending one of the Storage Field Days in the US shortly after my MinIO discovery in late 2017. What an opportunity!

I could not recall how I made the appointment to meeting MinIO, but I recalled myself taking an Uber to their cosy office on University Avenue in Palo Alto to meet. Through Andy Watson (one of the CTOs then), I was introduced to AB, Garima Kapoor, MinIO’s COO and his wife, Frank Wessels, Zamin (one of the business people who is no longer there) and Ugur Tigli (East Coast CTO) who was on the Polycom. I was awe struck.

Last week, MinIO scored a major Series B round funding of USD103 million. It was delayed by the pandemic because I recalled Garima telling me that the funding was happening in 2020. But I think the delay made it better, because the world now is even more ready for MinIO than ever before.

Continue reading

A conceptual distributed enterprise HCI with open source software

Cloud computing has changed everything, at least at the infrastructure level. Kubernetes is changing everything as well, at the application level. Enterprises are attracted by tenets of cloud computing and thus, cloud adoption has escalated. But it does not have to be a zero-sum game. Hybrid computing can give enterprises a balanced choice, and they can take advantage of the best of both worlds.

Open Source has changed everything too because organizations now has a choice to balance their costs and expenditures with top enterprise-grade software. The challenge is what can organizations do to put these pieces together using open source software? Integration of open source infrastructure software and applications can be complex and costly.

The next version of HCI

Hyperconverged Infrastructure (HCI) also changed the game. Integration of compute, network and storage became easier, more seamless and less costly when HCI entered the market. Wrapped with a single control plane, the HCI management component can orchestrate VM (virtual machine) resources without much friction. That was HCI 1.0.

But HCI 1.0 was challenged, because several key components of its architecture were based on DAS (direct attached) storage. Scaling storage from a capacity point of view was limited by storage components attached to the HCI architecture. Some storage vendors decided to be creative and created dHCI (disaggregated HCI). If you break down the components one by one, in my opinion, dHCI is just a SAN (storage area network) to HCI. Maybe this should be HCI 1.5.

A new version of an HCI architecture is swimming in as Angelfish

Kubernetes came into the HCI picture in recent years. Without the weights and dependencies of VMs and DAS at the HCI server layer, lightweight containers orchestrated, mostly by, Kubernetes, made distribution of compute easier. From on-premises to cloud and in between, compute resources can easily spun up or down anywhere.

Continue reading

Rethinking data processing frameworks systems in real time

“Row, row, row your boat, gently down the stream…”

Except the stream isn’t gentle at all in the data processing’s new context.

For many of us in the storage infrastructure and data management world, the well known framework is storing and retrieve data from a storage media. That media could be a disk-based storage array, a tape, or some cloud storage where the storage media is abstracted from the users and the applications. The model of post processing the data after the data has safely and persistently stored on that media is a well understood and a mature one. Users, applications and workloads (A&W) process this data in its resting phase, retrieve it, work on it, and write it back to the resting phase again.

There is another model of data processing that has been bubbling over the years and now reaching a boiling point. Still it has not reached its apex yet. This is processing the data in flight, while it is still flowing as it passes through processing engine. The nature of this kind of data is described in one 2018 conference I chanced upon a year ago.

letgo marketplace processing numbers in 2018

  • * NRT = near real time

From a storage technology infrastructure perspective, this kind of data processing piqued my curiosity immensely. And I have been studying this burgeoning new data processing model in my spare time, and where it fits, bringing the understanding back into the storage infrastructure and data management side.

Continue reading

What the heck is Storage Modernization?

We often hear the word “modernization” thrown around these days. The push is to get the end user to refresh their infrastructure, and the storage infrastructure market is rife with modernization word. Is your storage ripe for “modernization“?

Many possibilities to modernize storage

To modernize, it has to be relative to legacy storage hardware, and the operating environment that came with it. But if the so-called “legacy” still does the job, should you modernize?

Big Data is right

When the word “Big Data” came into prominence a while back, it stirred the IT industry into a frenzy. At one point, Apache Hadoop became the poster elephant (pun intended) for this exciting new segment. So many Vs came out, but I settled with 4 Vs as the framework of my IT conversations. The 4Vs we often hear are:

  • Volume
  • Velocity
  • Variety
  • Veracity

Continue reading

The future of Fibre Channel in the Cloud Era

The world has pretty much settled that hybrid cloud is the way to go for IT infrastructure services today. Straddled between the enterprise data center and the infrastructure-as-a-service in public cloud offerings, hybrid clouds define the storage ecosystems and architecture of choice.

A recent Blocks & Files article, “Broadcom server-storage connectivity sales down but recovery coming” caught my attention. One segment mentioned that the server-storage connectivity sales was down 9% leading me to think “Is this a blip or is it a signal that Fibre Channel, the venerable SAN (storage area network) protocol is on the wane?

Fibre Channel Sign

Thus, I am pondering the position of Fibre Channel SANs in the cloud era. Where does it stand now and in the near future? Continue reading

The hot cold times of HCI

Hyperconverged Infrastructure (HCI) is a hot technology. It has been for the past decade since Nutanix™ took the first mover advantage from the Converged Infrastructure (CI) technology segment and made it pretty much its ownfor a while.

Hyper Converged Infrastructure

But the HCI market (not the technology) is a strange one. It is hot. It is cold. The perennial leader, Nutanix™, has yet to eke out a profitable year. VMware® is strong in the market. Cisco™, which was hot with their HyperFlex solution in 2019, was also stopped short with a dismal decline in the IDC Worldwide HCI 2Q2020 tracker below:

IDC Worldwide Hyperconverged Infrastructure Tracker – 2Q2020

dHCI = Disaggregated or discombobulated? 

dHCI is known as disaggregated HCI. The disaggregation part is disaggregated hardware, especially on the storage part. Vendors like HPE® with Nimble Storage, Hitachi Vantara, NetApp® and a few more have touted the disaggregation of the performance and capacity, the separation of storage and compute as a value proposition but through close inspection, it is just another marketing ploy to attach a SAN storage to servers. It was marketing old wine in a new bottle. As rightly pointed out by my friend, Charles Chow of Commvault® quoted in his blog

Continue reading