Is denying public cloud storage a storm in a political teacup or something more?

Ah, India! The country that gave us the Silicon Valley of Asia in Bengaluru, and exports over USD$150 billion software and IT services to the world.

Last week, the government of India banned the use of non-sanctioned public cloud storage such as Google® Drive and Dropbox®, plus the use of VPNs (virtual private networks). This is nothing new as China has banned foreign VPN services, Dropbox®, for years while Google® was adjusting its plans for China in 2020, with little hope to do more it is allowed to. I am not sure what the India’s alternatives are but China already has their own cloud storage services for a while now. So, what does this all mean?

India bans public cloud storage and VPN services

Public cloud storage services has been a boon for over a decade since Dropbox® entered the scene in 2008. BYOD (bring your own devices) became a constant in every IT person’s lips at that time. And with the teaser of 2GB or more, many still rely on these public cloud storage services with the ability to sync with tablets, smart phones and laptops. But the proliferation of these services also propagated many cybersecurity risks, and yes, ransomware can infect these public cloud storage. Even more noxious, the synchronization of files and folders of these services with on-premises storage devices makes it easy for infected data to spread, often with great efficacy.

Banning these widely available cloud storage applications is more than an inconvenience. Governments like China and India are shoring up their battlegrounds, as the battle for the protection and the privacy of sovereign data will not only escalate but also create a domino effect in the geopolitical dominance in the digital landscape.

We have already seen news that India is asserting its stance against China. First there was an app called “Remove China App” that came up in Google® Play Store in 2020. Also in 2020, the Ministry of Information Technology of India also banned 59 apps, mostly from China in order to protect the “sovereignty and integrity of India, defence of India, security of state and public order”.

This is not the war of 2 of the most populous nations of the world. Underneath these acts, there are more things to come, and it won’t just involve China and India. We will see other nations follow, with some already in the works to draw boundaries and demarcate digital borders in the name of data security, privacy, sovereignty and protection.

I hear of some foreign vendors lamenting about such a move. Most have already either complied with China’s laws or chose to exit that market. This recent move by India may feel like a storm in a teacup, but beneath it all, the undercurrent is getting stronger each day. A digital geopolitical tempest is percolating and brewing.

Object Storage becoming storage lingua franca of Edge-Core-Cloud

Data Fabric was a big buzzword going back several years. I wrote a piece talking about Data Fabric, mostly NetApp®’s,  almost 7 years ago, which I titled “The Transcendence of Data Fabric“. Regardless of storage brands and technology platforms, and each has its own version and interpretations, one thing holds true. There must be a one layer of Data Singularity. But this is easier said than done.

Fast forward to present. The latest buzzword is Edge-to-Core-Cloud or Cloud-to-Core-Edge. The proliferation of Cloud Computing services, has spawned beyond to multiclouds, superclouds and of course, to Edge Computing. Data is reaching to so many premises everywhere, and like water, data has found its way.

Edge-to-Core-to-Cloud (Gratitude thanks to https://www.techtalkthai.com/dell-technologies-opens-iot-solutions-division-and-introduces-distributed-core-architecture/)

The question on my mind is can we have a single storage platform to serve the Edge-to-Core-to-Cloud paradigm? Is there a storage technology which can be the seamless singularity of data? 7+ years onwards since my Data Fabric blog, The answer is obvious. Object Storage.

The ubiquitous object storage and the S3 access protocol

For a storage technology that was initially labeled “cheap and deep”, object storage has become immensely popular with developers, cloud storage providers and is fast becoming storage repositories for data connectors. I wrote a piece called “All the Sources and Sinks going to Object Storage” over a month back, which aptly articulate how far this technology has come.

But unknown to many (Google NASD and little is found), object storage started its presence in SNIA (it was developed in Carnegie-Mellon University prior to that) in the early 90s, then known as NASD (network attached secure disk). As it is made its way into the ANSI T10 INCITS standards development, it became known as Object-based Storage Device or OSD.

The introduction of object storage services 16+ years ago by Amazon Web Services (AWS) via their Simple Storage Services (S3) further strengthened the march of object storage, solidified its status as a top tier storage platform. It was to AWS’ genius to put the REST API over HTTP/HTTPS with its game changing approach to use CRUD (create, retrieve, update, delete) operations to work with object storage. Hence the S3 protocol, which has become the de facto access protocol to object storage.

Yes, I wrote those 2 blogs 11 and 9 years ago respectively because I saw that object storage technology was a natural fit to the burgeoning new world of storage computing. It has since come true many times over.

Continue reading

Unstructured Data Observability with Datadobi StorageMAP

Let’s face it. Data is bursting through its storage seams. And every organization now is storing too much data that they don’t know they have.

By 2025, IDC predicts that 80% the world’s data will be unstructured. IDC‘s report Global Datasphere Forecast 2021-2025 will see the global data creation and replication capacity expand to 181 zettabytes, an unfathomable figure. Organizations are inundated. They struggle with data growth, with little understanding of what data they have, where the data is residing, what to do with the data, and how to manage the voluminous data deluge.

The simple knee-jerk action is to store it in cloud object storage where the price of storage is $0.0000xxx/GB/month. But many IT departments in these organizations often overlook the fact that that the data they have parked in the cloud require movement between the cloud and on-premises. I have been involved in numerous discussions where the customers realized that they moved the data in the cloud moved too frequently. Often it was an erred judgement or short term blindness (blinded by the cheap storage costs no doubt), further exacerbated by the pandemic. These oversights have resulted in expensive and painful monthly API calls and egress fees. Welcome to reality. Suddenly the cheap cloud storage doesn’t sound so cheap after all.

The same can said about storing non-active unstructured data on primary storage. Many organizations have not been disciplined to practise good data management. The primary Tier 1 storage becomes bloated over time, grinding sluggishly as the data capacity grows. I/O processing becomes painfully slow and backup takes longer and longer. Sounds familiar?

The A in ABC

I brought up the ABC mantra a few blogs ago. A is for Archive First. It is part of my data protection consulting practice conversation repertoire, and I use it often to advise IT organizations to be smart with their data management. Before archiving (some folks like to call it tiering, but I am not going down that argument today), we must know what to archive. We cannot blindly send all sorts of junk data to the secondary or tertiary storage premises. If we do that, it is akin to digging another hole to fill up the first hole.

We must know which unstructured data to move replicate or sync from the Tier 1 storage to a second (or third) less taxing storage premises. We must be able to see this data, observe its behaviour over time, and decide the best data management practice to apply to this data. Take note that I said best data management practice and not best storage location in the previous sentence. There has to be a clear distinction that a data management strategy is more prudent than to a “best” storage premises. The reason is many organizations are ignorantly thinking the best storage location (the thought of the “cheapest” always seems to creep up) is a good strategy while ignoring the fact that data is like water. It moves from premises to premises, from on-prem to cloud, cloud to other cloud. Data mobility is a variable in data management.

Continue reading

Building Trust in the Storage Brand

Trust is everything. When done right, the brand is trust.

One Wikibon article last month “Does Hardware (still) Matter?” touched on my sentiments and hit close to the heart. As the world becomes more and more data driven and cloud-centric, the prominence of IT infrastructure has diminished from the purview of the boardroom. The importance of IT infrastructure cannot be discounted but in this new age, storage infrastructure has become invisible.

In the seas of both on-premises and hybrid storage technology solutions, everyone is trying to stand out, trying to eke the minutest ounces of differentiation and advantage to gain the customer’s micro-attention. With all the drum beatings, the loyalty of the customer can switch in an instance unless we build trust.

I ponder a few storage industry variables that help build trust.

Open source Communities and tribes

During the hey-days of proprietary software and OSes, protectionism was key to guarding the differentiations and the advantages. Licenses were common, and some were paired with the hardware hostid to create that “power combination”. And who can forget those serial dongles license keys? Urgh!!

Since the open source movement (Read The Cathedral and the Bazaar publication) began, the IT world has begun to trust software and OSes more and more. Open Source communities grew and technology tribes were formed in all types of niches, including storage software. Trust grew because the population of the communities kept the vendors honest. Gone are the days of the Evil Empire. Even Microsoft® became a ‘cool kid’.

TRUST

One open source storage filesystem I worked extensively on is OpenZFS. From its beginnings after Open Solaris® (remember build 134), becoming part of the Illumos project and then later in FreeBSD® and Linux upstream. Trust in OpenZFS was developed over time because of the open source model. It has spawned many storage projects including FreeNAS™ which later became TrueNAS®.

Continue reading

All the Sources and Sinks going to Object Storage

The vocabulary of sources and sinks are beginning to appear in the world of data storage as we witness the new addition of data processing frameworks and the applications in this space. I wrote about this in my blog “Rethinking data. processing frameworks systems in real time” a few months ago, introducing my take on this budding new set of I/O characteristics and data ecosystem. I also started learning about the Kappa Architecture (and Lambda as well), a framework designed to craft and develop a set of amalgamated technologies to handle stream processing of a series of data in relation to time.

In Computer Science, sources and sinks are considered external entities that often serve as connectors of input and output of disparate systems. They are often not in the purview of data storage architects. Also often, these sources and sinks are viewed as black boxes, and their inner workings are hidden from the views of the data storage architects.

Diagram from https://developer.here.com/documentation/get-started/dev_guide/shared_content/topics/olp/concepts/pipelines.html

The changing facade of data stream processing presents the constant motion of data, the continuous data being altered as it passes through the many integrated sources and sinks. We are also see much of the data processed in-memory as much as possible. Thus, the data services from a traditional storage model of SAN and NAS may straggle with the requirements demanded by this new generation of data stream processing.

As the world of traditional data storage processing is expanding into data streams processing and vice versa, and the chatter of sources and sinks can no longer be ignored.

Continue reading

The young report card on Decentralized Storage

I kept this blog in my queue for over 4 months. I was reluctant to publish it because I thought the outrageous frenzies of NFTs (non-fungible tokens), metaverses and web3 were convoluting the discussions on the decentralized storage topic. 3 weeks back, a Google Trends search for these 3 opaque terms over 90 days showed that the worldwide fads were waning. Here was the Google Trends output on April 2, 2022:

Google Trends on NFT, metaverse and web3

Decentralized storage intrigues me. I like to believe in its potential and I often try to talk to people to strengthen the narratives, and support its adoption where it fits. But often, the real objectives of decentralized storage are obfuscated by the polarized conversations about cryptocurrencies that are pegged to their offerings, NFTs (non-fungible tokens), DAOs (decentralized autonomous organizations) and plenty of hyperboles with bewildering facts as well.

But I continue to seek sustainable conversations about decentralized storage without the sway of the NFTs or the cryptos. After dipping in my toes and experiencing with HODLers, and looking at the return to sanity, I believe we can discuss decentralized storage with better clarity now. The context is to position decentralized storage to the mainstream, specifically to business organizations already immersed in centralized storage. Here is my fledgling report card on decentralized storage.

Continue reading

Backup – Lest we forget

World Backup Day – March 31st

Last week was World Backup Day. It is on March 31st every year so that you don’t lose your data and become an April’s Fool the next day.

Amidst the growing awareness of the importance of backup, no thanks to the ever growing destructive nature of ransomware, it is important to look into other aspects of data protection – both a data backup/recovery and a data security –  point of view as well.

3-2-1 Rule, A-B-C and Air Gaps

I highlighted the basic 3-2-1 rule before. This must always be paired with a set of practised processes and policies to cultivate all stakeholders (aka the people) in the organization to understand the importance of protecting the data and ensuring data recoverability.

The A-B-C is to look at the production dataset and decide if the data should be stored in the Tier 1 storage. In most cases, the data becomes less active and these datasets may be good candidates to be archived. Once archived, the production dataset is smaller and data backup operations become lighter, faster and have positive causation as well.

Air gaps have returned to prominence since the heightened threats on data in recent years. The threats have pushed organizations to consider doing data offsite and offline with air gaps. Cost considerations and speed of recovery can be of concerns, and logical air gaps are also gaining style as an acceptable extra layer of data. protection.

Backup is not total Data Protection cyberdefence

If we view data protection more holistically and comprehensively, backup (and recovery) is not the total data protection solution. We must ignore the fancy rhetorics of the technology marketers that backup is the solution to ensure data protection because there is much more than that.

The well respected NIST (National Institute of Standards and Technology) Cybersecurity Framework places Recovery (along with backup) as the last pillar of its framework.

NIST Cybersecurity Framework

Continue reading

IT Data practices and policies still wanting

There is an apt and honest editorial cartoon about Change.

From https://commons.wikimedia.org/wiki/File:Who-Wants-Change-Crowd-Change-Management-Yellow.png

I was a guest of Channel Asia Executive Roundtable last week. I joined several luminaries in South East Asia to discuss about the topic of “How Partners can bring value to the businesses to manage their remote workforce“.

Covid-19 decimated what we knew as work in general. The world had to pivot and now, 2+ years later, a hybrid workforce has emerged. The mixture of remote work, work-from-home (WFH), physical office and everywhere else has brought up a new mindset and new attitudes with both the employers and their staff alike. Without a doubt, the remote way of working is here to stay.

People won but did the process lose?

The knee jerk reactions when the lockdowns of Covid hit were to switch work to remote access to applications on premises or in the clouds. Many companies have already moved to the software-as-a-service (SaaS) way of working but not all have made the jump, just like not all the companies’ applications were SaaS based. Of course, the first thing these stranded companies do was to look for the technologies to solve this unforeseen disorder.

People Process Technology.
Picture from https://iconstruct.com/blog/people-process-technology/

Continue reading

Please cultivate 3-2-1 and A-B-C of Data Management

My Sunday morning was muddled 2 weeks ago. There was a frenetic call from someone whom I knew a while back and he needed some advice. Turned out that his company’s files were encrypted and the “backups” (more on this later) were gone. With some detective work, I found that their files were stored in a Synology® NAS, often accessed via QuickConnect remotely, and “backed up” to Microsoft® Azure. I put “Backup” in inverted commas because their definition of “backup” was using Synology®’s Cloud Sync to Azure. It is not a true backup but a file synchronization service that often mislabeled as a data protection backup service.

All of his company’s projects files were encrypted and there were no backups to recover from. It was a typical ransomware cluster F crime scene.

I would have gloated because many of small medium businesses like his take a very poor and lackadaisical attitude towards good data management practices. No use crying over spilled milk when prevention is better than cure. But instead of investing early in the prevention, the cure would likely be 3x more expensive. And in this case, he wanted to use Deloitte® recovery services, which I did not know existed. Good luck with the recovery was all I said to him after my Sunday morning was made topsy turvy of sorts.

NAS is the ransomware goldmine

I have said it before and I am saying it again. NAS devices, especially the consumer and prosumer brands, are easy pickings because there was little attention paid to implement a good data management practice either by the respective vendor or the end users themselves. 2 years ago I was already seeing a consistent pattern of the heightened ransomware attacks on NAS devices, especially the NAS devices that proliferated the small medium businesses market segment.

The WFH (work from home) practice trigged by the Covid-19 pandemic has made NAS devices essential for businesses. NAS are the workhorses of many businesses after all.  The ease of connecting from anywhere with features similar to the Synology® QuickConnect I mentioned earlier, or through VPNs (virtual private networks), or a self created port forwarding (for those who wants to save a quick buck [ sarcasm ]), opened the doors to bad actors and easy ransomware incursions. Good data management practices are often sidestepped or ignored in exchange for simplicity, convenience, and trying to save foolish dollars. Until ….

Continue reading