The Currency to grow Decentralized Storage

Unless you have been living under a rock in the past months, the fervent and loud, but vague debates of web3.0 have been causing quite a scene on the Internet. Those tiny murmurs a few months ago have turned into an avalanche of blares and booms, with both believers and detractors crying out their facts and hyperboles.

Within the web3.0, decentralized storage technologies have been rising to a crescendo. So many new names have come forth into the decentralized storage space, most backed by blockchain and incentivized by cryptocurrencies and is putting the 19th century California Gold Rush to shame.

At present, the decentralized storage market segment is fluid, very vibrant and very volatile. Being the perennial storage guy that I am, I would very much like the decentralized storage to be sustainably successful but first, it has to make sense. Logic must prevail before confidence follows.

Classic “Crossing the Chasm”

To understand this decentralization storage chaos, we must understand where it is now, and where it is going. History never forgets to teach us of the past to be intelligible in the fast approaching future.

I look to this situation as a classic crossing the chasm case. This Crossing the Chasm concept was depicted in Geoffrey Moore’s 1991 book of the same name. In his book, he spoke well about the Technology Adoption Cycle that classifies and demonstrates the different demographics and psychological progression (and regression) of how a technology is taken to mainstream.

Geoffrey Moore’s Crossing the Chasm Technology (Disruption) Adoption Cycle

As a new technology enters the market, the adoption is often fueled by the innovators, the testers, the crazy ones. It progresses and the early adopters set in. Here we get the believers, the fanatics, the cults that push the envelope a bit further, going against the institutions and the conventions. This, which is obvious, describes the early adopter stage of the decentralized storage today.

Like all technologies, it has to go mainstream to be profitable and to get there, its value to the masses must be well defined to be accepted. This is the market segment that decentralized storage must move to, to the early majority stage. But there is a gap, rightly pointed out and well defined by Geoffrey Moore. The “Chasm“. [ Note: To read about why the chasm, read this article ].

So how will decentralized storage cross the chasm to the majority of the market?

Continue reading

Rethinking data processing frameworks systems in real time

“Row, row, row your boat, gently down the stream…”

Except the stream isn’t gentle at all in the data processing’s new context.

For many of us in the storage infrastructure and data management world, the well known framework is storing and retrieve data from a storage media. That media could be a disk-based storage array, a tape, or some cloud storage where the storage media is abstracted from the users and the applications. The model of post processing the data after the data has safely and persistently stored on that media is a well understood and a mature one. Users, applications and workloads (A&W) process this data in its resting phase, retrieve it, work on it, and write it back to the resting phase again.

There is another model of data processing that has been bubbling over the years and now reaching a boiling point. Still it has not reached its apex yet. This is processing the data in flight, while it is still flowing as it passes through processing engine. The nature of this kind of data is described in one 2018 conference I chanced upon a year ago.

letgo marketplace processing numbers in 2018

  • * NRT = near real time

From a storage technology infrastructure perspective, this kind of data processing piqued my curiosity immensely. And I have been studying this burgeoning new data processing model in my spare time, and where it fits, bringing the understanding back into the storage infrastructure and data management side.

Continue reading

How well do you know your data and the storage platform that processes the data

Last week was consumed by many conversations on this topic. I was quite jaded, really. Unfortunately many still take a very simplistic view of all the storage technology, or should I say over-marketing of the storage technology. So much so that the end users make incredible assumptions of the benefits of a storage array or software defined storage platform or even cloud storage. And too often caveats of turning on a feature and tuning a configuration to the max are discarded or neglected. Regards for good storage and data management best practices? What’s that?

I share some of my thoughts handling conversations like these and try to set the right expectations rather than overhype a feature or a function in the data storage services.

Complex data networks and the storage services that serve it

I/O Characteristics

Applications and workloads (A&W) read and write from the data storage services platforms. These could be local DAS (direct access storage), network storage arrays in SAN and NAS, and now objects, or from cloud storage services. Regardless of structured or unstructured data, different A&Ws have different behavioural I/O patterns in accessing data from storage. Therefore storage has to be configured at best to match these patterns, so that it can perform optimally for these A&Ws. Without going into deep details, here are a few to think about:

  • Random and Sequential patterns
  • Block sizes of these A&Ws ranging from typically 4K to 1024K.
  • Causal effects of synchronous and asynchronous I/Os to and from the storage

Continue reading

Storage Elephant Compute Birds

Data movement is expensive. Not just costs, but also latency and resources as well. Thus there were many narratives to move compute closer to where the data is stored because moving compute is definitely more economical than moving data. I borrowed the analogy of the 2 animals from some old NetApp® slides which depicted storage as the elephant, and compute as birds. It was the perfect analogy, because the storage is heavy and compute is light.

“Close up of a white Great Egret perching on top of an African Elephant aa Amboseli national park, Kenya”

Before the animals representation came about I used to use the term “Data locality, Data Mobility“, because of past work on storage technology in the Oil & Gas subsurface data management pipeline.

Take stock of your data movement

I had recent conversations with an end user who has been paying a lot of dollars keeping their “backup” and “archive” in AWS Glacier. The S3 storage is cheap enough to hold several petabytes of data for years, because the IT folks said that the data in AWS Glacier are for “backup” and “archive”. I put both words in quotes because they were termed as “backup” and “archive” because of their enterprise practice. However, the face of their business is changing. They are in manufacturing, oil and gas downstream, and the definitions of “backup” and “archive” data has changed.

For one, there is a strong demand for reusing the past data for various reasons and these datasets have to be recalled from their cloud storage. Secondly, their data movement activities still mimicked what they did in the past during their enterprise storage days. It was a classic lift-and-shift when they moved to the cloud, and not taking stock of  their data movements and the operations they ran on these datasets. Still ongoing, their monthly AWS cost a bomb.

Continue reading

Right time for Andrew. The Filesystem that is.

I couldn’t hold my excitement when I discovered Auristor® early last week. I stumbled upon this Computerweekly article “Want to side step Public Cloud? Auristor® offers global file storage.” Given the many news not exactly praising the public cloud storage vendors nowadays, the article’s title caught my attention. Immediately Andrew File System (AFS) was there. I was perplexed at first because I have never seen or heard a commercial version of AFS before. This news gave me goosebumps.

For the curious, I am sure many will ask who is this Andrew anyway? What is my relationship with this Andrew?

One time with Andrew

A bit of my history. I recalled quite vividly helping Intel in Penang, Malaysia to implement their globally distributed file caching mechanism with the NetApp® filer’s NFS. It was probably 2001 and I believed Intel wanted to share their engineering computing (EC) files between their US facilities and Intel Penang Design Center (PDC). As I worked along with the Intel folks, I found out that this distributed file caching technology was called Andrew File System (AFS).

Although I couldn’t really recalled how the project went, I remembered it being a bed of bugs at that time. But being the storage geek that I am, I obviously took some time to get to know Andrew the File System. 20 years have gone by, and I never really thought of AFS coming out as a commercial solution or even knew of it as one, until Auristor®,

Auristor Logo

Continue reading

What the heck is Storage Modernization?

We often hear the word “modernization” thrown around these days. The push is to get the end user to refresh their infrastructure, and the storage infrastructure market is rife with modernization word. Is your storage ripe for “modernization“?

Many possibilities to modernize storage

To modernize, it has to be relative to legacy storage hardware, and the operating environment that came with it. But if the so-called “legacy” still does the job, should you modernize?

Big Data is right

When the word “Big Data” came into prominence a while back, it stirred the IT industry into a frenzy. At one point, Apache Hadoop became the poster elephant (pun intended) for this exciting new segment. So many Vs came out, but I settled with 4 Vs as the framework of my IT conversations. The 4Vs we often hear are:

  • Volume
  • Velocity
  • Variety
  • Veracity

Continue reading

The future of Fibre Channel in the Cloud Era

The world has pretty much settled that hybrid cloud is the way to go for IT infrastructure services today. Straddled between the enterprise data center and the infrastructure-as-a-service in public cloud offerings, hybrid clouds define the storage ecosystems and architecture of choice.

A recent Blocks & Files article, “Broadcom server-storage connectivity sales down but recovery coming” caught my attention. One segment mentioned that the server-storage connectivity sales was down 9% leading me to think “Is this a blip or is it a signal that Fibre Channel, the venerable SAN (storage area network) protocol is on the wane?

Fibre Channel Sign

Thus, I am pondering the position of Fibre Channel SANs in the cloud era. Where does it stand now and in the near future? Continue reading

Windows SMB synchronous writes with OpenZFS

Sometimes I get really pissed off with myself because I have taken a bigoted view, and ended up with eggs on my face. The past week was like that, and the problem was gnawing me on the inside all week, because I was determined to balance my equilibrium by finding the answer.

Early in the week, I was having a conversation with a potential customer. It evolved around the missing 10 seconds or so of the video footage between the users of a popular video editing software. The company had 70% Windows users, and 30% users on the Mac, both sides accessing the NAS device. The issue was the editors on the Windows side will store the raw and edited files to the NAS, but when the Mac users read them, they will often find 10 seconds or so of the stored video files missing.

The likeliest culprit of this problem is the way the SMB protocol write I/O behaves in Windows and in MacOS. Windows SMB, by default, writes I/O asynchronously while SMB on MacOS writes I/O synchronously.

I had a strong conviction I had the answer to this issue but this was not a TrueNAS®, It was another brand of NAS that I did not have knowledge of, and so, I left the conversation feeling quite embarrassed because I had the answer only on the TrueNAS® server side, not on the Windows client side. Bigotry blinded me. Hmmph! 

SMB (Server Message Block) client-server model

Continue reading

Data Sovereignty – A boon or a bane?

Data across borders – Data Sovereignty

I really did not want to write Data Sovereignty in the way I have written it now. I wanted to write it in a happy manner, but as recent circumstances appeared, the outlook began to dim. I apologize if my commentary is bleak.

Last week started very well. I was preparing for the iXsystems™ + Nextcloud webinar on Wednesday, August 25th 2021. After talking to the wonderful folks at Nextcloud (Thanks Markus, Uwe and Maxime!), the central theme of the webinar was on Data Sovereignty and Data Control. The notion of GDPR (General Data Protection Regulation) has already  permeated into EU (European Union) entities, organizations and individuals alike, and other sovereign states around the world are following suit. Prominent ones on my radar in the last 2 years were the California Consumer Privacy Act (CCPA) and Vietnam Personal Data Protection Act 2020.

Continue reading

SSOT of Files

[ This is part two of “Where are your files living now?”. You can read Part One here ]

Data locality, Data mobility“. It was a term I like to use a lot when describing about data consolidation, leading to my mention about files and folders, and where they live in my previous blog. The thinking of where the files and folders are now as in everywhere as they can be in a plethora of premises stretches the premise of SSOT (Single Source of Truth). And this expatriation of files with minimal checks and balances disturbs me.

A year ago, just before I joined iXsystems, I was given Google® embargoed news, probably a week before they announced BigQuery Omni. Then I was interviewed by Enterprise IT News, a local Malaysian technology news portal to provide an opinion quote. This was what I quoted:

“’The data warehouse in the cloud’ managed services of Big Query is underpinned by Google® Anthos, its hybrid cloud infra and service management platform based on GKE (Google® Kubernetes Engine). The containerised applications, both on-prem and in the multi-clouds, would allow Anthos to secure and orchestrate infra, services and policy management under one roof.”

I further quoted ” The data repositories remain in each cloud is good to address data sovereignty, data security concerns but it did not mention how it addresses “single source of truth” across multi-clouds.

Single Source of Truth – regardless of repositories

Continue reading