All the Sources and Sinks going to Object Storage

The vocabulary of sources and sinks are beginning to appear in the world of data storage as we witness the new addition of data processing frameworks and the applications in this space. I wrote about this in my blog “Rethinking data. processing frameworks systems in real time” a few months ago, introducing my take on this budding new set of I/O characteristics and data ecosystem. I also started learning about the Kappa Architecture (and Lambda as well), a framework designed to craft and develop a set of amalgamated technologies to handle stream processing of a series of data in relation to time.

In Computer Science, sources and sinks are considered external entities that often serve as connectors of input and output of disparate systems. They are often not in the purview of data storage architects. Also often, these sources and sinks are viewed as black boxes, and their inner workings are hidden from the views of the data storage architects.

Diagram from https://developer.here.com/documentation/get-started/dev_guide/shared_content/topics/olp/concepts/pipelines.html

The changing facade of data stream processing presents the constant motion of data, the continuous data being altered as it passes through the many integrated sources and sinks. We are also see much of the data processed in-memory as much as possible. Thus, the data services from a traditional storage model of SAN and NAS may straggle with the requirements demanded by this new generation of data stream processing.

As the world of traditional data storage processing is expanding into data streams processing and vice versa, and the chatter of sources and sinks can no longer be ignored.

Continue reading

The young report card on Decentralized Storage

I kept this blog in my queue for over 4 months. I was reluctant to publish it because I thought the outrageous frenzies of NFTs (non-fungible tokens), metaverses and web3 were convoluting the discussions on the decentralized storage topic. 3 weeks back, a Google Trends search for these 3 opaque terms over 90 days showed that the worldwide fads were waning. Here was the Google Trends output on April 2, 2022:

Google Trends on NFT, metaverse and web3

Decentralized storage intrigues me. I like to believe in its potential and I often try to talk to people to strengthen the narratives, and support its adoption where it fits. But often, the real objectives of decentralized storage are obfuscated by the polarized conversations about cryptocurrencies that are pegged to their offerings, NFTs (non-fungible tokens), DAOs (decentralized autonomous organizations) and plenty of hyperboles with bewildering facts as well.

But I continue to seek sustainable conversations about decentralized storage without the sway of the NFTs or the cryptos. After dipping in my toes and experiencing with HODLers, and looking at the return to sanity, I believe we can discuss decentralized storage with better clarity now. The context is to position decentralized storage to the mainstream, specifically to business organizations already immersed in centralized storage. Here is my fledgling report card on decentralized storage.

Continue reading

Time to Conflate Storage with Data Services

Around the year 2016, I started to put together a better structure to explain storage infrastructure. I started using the word Data Services Platform before what it is today. And I formed a pictorial scaffold to depict what I wanted to share. This was what I made at that time.

Data Services Platform (circa 2016)- Copyright Heoh Chin Fah

One of the reasons I am bringing this up again is many of the end users and resellers still look at storage from the perspective of capacity, performance and price. And as if two plus two equals five, many storage pre-sales and architects reciprocate with the same type of responses that led to the deteriorated views of the storage technology infrastructure industry as a whole. This situation irks me. A lot.

Continue reading

HODLing Decentralized Storage is not zero sum

I have been dipping my toes into decentralized storage. I wrote about “Crossing the Chasm” last month where most early technologies have to experience to move into the mainstream adoption. I believe the same undertaking is going on for decentralized storage and the undercurrents are beginning to feel like a tidal wave. However, the clarion calls and the narratives around decentralized storage are beginning to sound the same after several months on researching the subject.

Salient points of decentralized storage

I have summarized a bunch of these arguments for decentralized storage. They are:

  • Democratization of cloud storage services separate from the hyperscaling behemoths of Web2
  • Inherent data security with default encryption, immutability and blockchain-ed. (most decentralized storage are blockchain-based. A few are not)
  • Data privacy with the security key for data decryption and authentication with the data owner(s)
  • No centralized control of data storage services, prices, market transparency and sovereignty
  • Green with more efficient energy consumption compared to Bitcoin
  • Data durability with data sharding creating no single point of failure and maintaining continuous data access services with geo content dispersal

Rocket fuel – The cryptos

Most early adoptions of a new technology require some sort of bliztscaling momentum to break free from the gravity of the old one. The cryptocurrencies pegged to many decentralized storage platforms are the rocket fuel to power the conversations and the narratives of the decentralized storage today. I probably counted over a hundred of these types of cryptocurrencies, with more jumping into the bandwagon as the gravy train moves ahead.

The table below is part of a TechTarget Search Storage article “7 Decentralized Storage Networks compared“. I found this article most enlightening.

7 Decentralized Storage Compared

Continue reading

Celebrating MinIO

Essentially MinIO is a web server …

I vaguely recalled Anand Babu Periasamy (AB as he is known), the CEO of MinIO saying that when I first met him in 2017. I was fresh “playing around” with MinIO and instantly I fell in love with software technology. Wait a minute. Object storage wasn’t supposed to be so easy. It was not supposed to be that simple to set up and use, but MinIO burst into my storage universe like the birth of the Infinity Stones. There was a eureka moment. And I was attending one of the Storage Field Days in the US shortly after my MinIO discovery in late 2017. What an opportunity!

I could not recall how I made the appointment to meeting MinIO, but I recalled myself taking an Uber to their cosy office on University Avenue in Palo Alto to meet. Through Andy Watson (one of the CTOs then), I was introduced to AB, Garima Kapoor, MinIO’s COO and his wife, Frank Wessels, Zamin (one of the business people who is no longer there) and Ugur Tigli (East Coast CTO) who was on the Polycom. I was awe struck.

Last week, MinIO scored a major Series B round funding of USD103 million. It was delayed by the pandemic because I recalled Garima telling me that the funding was happening in 2020. But I think the delay made it better, because the world now is even more ready for MinIO than ever before.

Continue reading

Rethinking data processing frameworks systems in real time

“Row, row, row your boat, gently down the stream…”

Except the stream isn’t gentle at all in the data processing’s new context.

For many of us in the storage infrastructure and data management world, the well known framework is storing and retrieve data from a storage media. That media could be a disk-based storage array, a tape, or some cloud storage where the storage media is abstracted from the users and the applications. The model of post processing the data after the data has safely and persistently stored on that media is a well understood and a mature one. Users, applications and workloads (A&W) process this data in its resting phase, retrieve it, work on it, and write it back to the resting phase again.

There is another model of data processing that has been bubbling over the years and now reaching a boiling point. Still it has not reached its apex yet. This is processing the data in flight, while it is still flowing as it passes through processing engine. The nature of this kind of data is described in one 2018 conference I chanced upon a year ago.

letgo marketplace processing numbers in 2018

  • * NRT = near real time

From a storage technology infrastructure perspective, this kind of data processing piqued my curiosity immensely. And I have been studying this burgeoning new data processing model in my spare time, and where it fits, bringing the understanding back into the storage infrastructure and data management side.

Continue reading

How well do you know your data and the storage platform that processes the data

Last week was consumed by many conversations on this topic. I was quite jaded, really. Unfortunately many still take a very simplistic view of all the storage technology, or should I say over-marketing of the storage technology. So much so that the end users make incredible assumptions of the benefits of a storage array or software defined storage platform or even cloud storage. And too often caveats of turning on a feature and tuning a configuration to the max are discarded or neglected. Regards for good storage and data management best practices? What’s that?

I share some of my thoughts handling conversations like these and try to set the right expectations rather than overhype a feature or a function in the data storage services.

Complex data networks and the storage services that serve it

I/O Characteristics

Applications and workloads (A&W) read and write from the data storage services platforms. These could be local DAS (direct access storage), network storage arrays in SAN and NAS, and now objects, or from cloud storage services. Regardless of structured or unstructured data, different A&Ws have different behavioural I/O patterns in accessing data from storage. Therefore storage has to be configured at best to match these patterns, so that it can perform optimally for these A&Ws. Without going into deep details, here are a few to think about:

  • Random and Sequential patterns
  • Block sizes of these A&Ws ranging from typically 4K to 1024K.
  • Causal effects of synchronous and asynchronous I/Os to and from the storage

Continue reading

The future of Fibre Channel in the Cloud Era

The world has pretty much settled that hybrid cloud is the way to go for IT infrastructure services today. Straddled between the enterprise data center and the infrastructure-as-a-service in public cloud offerings, hybrid clouds define the storage ecosystems and architecture of choice.

A recent Blocks & Files article, “Broadcom server-storage connectivity sales down but recovery coming” caught my attention. One segment mentioned that the server-storage connectivity sales was down 9% leading me to think “Is this a blip or is it a signal that Fibre Channel, the venerable SAN (storage area network) protocol is on the wane?

Fibre Channel Sign

Thus, I am pondering the position of Fibre Channel SANs in the cloud era. Where does it stand now and in the near future? Continue reading

Data Sovereignty – A boon or a bane?

Data across borders – Data Sovereignty

I really did not want to write Data Sovereignty in the way I have written it now. I wanted to write it in a happy manner, but as recent circumstances appeared, the outlook began to dim. I apologize if my commentary is bleak.

Last week started very well. I was preparing for the iXsystems™ + Nextcloud webinar on Wednesday, August 25th 2021. After talking to the wonderful folks at Nextcloud (Thanks Markus, Uwe and Maxime!), the central theme of the webinar was on Data Sovereignty and Data Control. The notion of GDPR (General Data Protection Regulation) has already  permeated into EU (European Union) entities, organizations and individuals alike, and other sovereign states around the world are following suit. Prominent ones on my radar in the last 2 years were the California Consumer Privacy Act (CCPA) and Vietnam Personal Data Protection Act 2020.

Continue reading

Where are your files living now?

[ This is Part One of a longer conversation ]

EMC2 (before the Dell® acquisition) in the 2000s had a tagline called “Where Information Lives™**. This was before the time of cloud storage. The tagline was an adage of enterprise data storage, proper and contemporaneous to the persistent narrative at the time – Data Consolidation. Within the data consolidation stories, thousands of files and folders moved about the networks of the organizations, from servers to clients, clients to servers. NAS (Network Attached Storage) was, and still is the work horse of many, many organizations.

[ **Side story ] There was an internal anti-EMC joke within NetApp® called “Information has a new address”.

EMC tagline “Where Information Lives”

This was a time where there were almost no concerns about Shadow IT; ransomware were less known; and most importantly, almost everyone knew where their files and folders were, more or less (except in Oil & Gas upstream – to be told in later in this blog). That was because there were concerted attempts to consolidate data, and inadvertently files and folders, in the organization.

Even when these organizations were spread across the world, there were distributed file technologies at the time that could deliver files and folders in an acceptable manner. Definitely not as good as what we have today in a cloudy world, but acceptable. I personally worked a project setting up Andrew File Systems for Intel® in Penang in the mid-90s, almost joined Tacit Networks in the mid-2000s, dabbled on Microsoft® Distributed File System with NetApp® and Windows File Servers while fixing the mountains of issues in deploying the worldwide GUSto (Global Unified Storage) Project in Shell 2006. Somewhere in my chronological listings, Acopia Networks (acquired by F5) and of course, EMC2 Rainfinity and NetApp® NuView OEM, Virtual File Manager.

The point I am trying to make here is most IT organizations had a good grip of where the files and folders were. I do not think this is very true anymore. Do you know where your files and folders are living today? 

Continue reading