Open Source on my mind

Last week was cropped with topics around Open Source software. I want to voice my opinions here (with a bit of ranting) and hoping not to rouse many abhorrent comments from different parties and views. This blog is to create conversations, even controversial ones, but we must first agree that there will be disagreements. We must accept disagreements as part of this conversation.

In my 30 years career, Open Source has been a big part of my development and progress. The ideas of freely using (certain) software without any licensing implications and these software being openly available were not always welcomed, as they are now. I think the Open Source revolution has created an innovation movement that is still going strong, and it has not only permeated completely into the IT industry, Open Source has also now in almost every part of the technology-based industries as well. The Open Source influence is massive.

Open Source word cloud

In the beginning

In the beginning, in my beginning in 1992, the availability of software and its source codes was a closed one. Coming from a VAX/VMS background (I was a system admin in my mathematics department’s mini computers), Unix liberated my thinking. The final 6 months in the university was systems programming in C, and it completely changed how I wanted my career to shape. The mantra of “Free as in Freedom” in General Public License GPL (which I got know of much later) boded well with my own tenets in life.

If closed source development models led to proprietary software and a centralized way to distributing software with license, I would count the Open Source development models as one of the earliest decentralized technology frameworks. Down with the capitalistic corporations (aka Evil Empires)!

It was certainly a wonderful and generous way to make the world that it is today. It is a better world now.

Continue reading

Beyond the WORM with MinIO object storage

I find the terminology of WORM (Write Once Read Many) coming back into the IT speak in recent years. In the era of rip and burn, WORM was a natural thing where many of us “youngsters” used to copy files to a blank CD or DVD. I got know about how WORM worked when I learned that the laser in the CD burning process alters the chemical compound in a segment on the plastic disc of the CD, rendering the “burned” segment unwritable once it was written but it could be read many times.

At the enterprise level, I got to know about WORM while working with tape drives and tape libraries in the mid-90s. The objective of WORM is to save and archive the data and files in a non-rewritable format for compliance reasons. And it was the data compliance and data protection parts that got me interested into data management. WORM is a big deal in many heavily regulated industries such as finance and banking, insurance, oil and gas, transportation and more.

Obviously things have changed. WORM, while very much alive in the ageless tape industry, has another up-and-coming medium in Object Storage. The new generation of data infrastructure and data management specialists are starting to take notice.

Worm Storage – Image from Hubstor (https://www.hubstor.net/blog/write-read-many-worm-compliant-storage/)

I take this opportunity to take MinIO object storage for a spin in creating WORM buckets which can be easily architected as data compliance repositories with many applications across regulated industries. Here are some relevant steps.

Continue reading

Truthful information under attack. The call for Data Preservation

The slogan of The Washington Post is “Democracy Dies in Darkness“. Although not everyone agrees with the US brand of democracy, the altruism of WaPo‘s (the publication’s informal name) slogan is a powerful one. The venerable newspaper remains the beacon in the US as one of the most trustworthy sources of truthful, honest information.

4 Horsemen of Apocalypse with the 5th joining

Misinformation

Misinformation has become a clear and present danger to humanity. Fake news, misleading information, lies are fueling and propelling the propaganda and agenda of the powerful (and the deranged). Facts are blurred, obfuscated, and even removed and replaced with misinformation to push for the undesirable effects that will affect the present and future generations.

The work of SNIA®

Data preservation is part of Data Management. More than a decade ago, SNIA® has already set up a technical work group (TWG) on Long Term Retention and proposed a format for long-term storage of digital format. It was called SIRF (Self-contained Information Retention Format). In the words of SNIA®, “The SIRF format enables long-term physical storage, cloud storage and tape-based containers effective and efficient ways to preserve and secure digital information for many decades, even with the ever-changing technology landscape.”

I don’t think battling misinformation was SNIA®’s original intent, but the requirements for a vendor-neutral organization as such to present and promote long term data preservation is more needed than ever. The need to protect the truth is paramount.

SNIA® continues to work with many organizations to create and grow the ecosystem for long term information retention and data preservation.

NFTs can save data

Despite the hullabaloo of NFTs (non-fungible tokens), which is very much soiled and discredited by the present day cryptocurrency speculations, I view data (and metadata) preservation as a strong use case for NFTs. The action is to digitalize data into an NFT asset.

Here are a few arguments:

  1. NFTs are unique. Once they are verified and inserted into the blockchain, they are immutable. They cannot be modified, and each blockchain transaction is created with one never to be replicated hashed value.
  2. NFTs are decentralized. Most of the NFTs we know of today are minted via a decentralized process. This means that the powerful cannot (most of the time), effect the NFTs state according to its whims and fancies. Unless the perpetrators know how to manipulate a Sybil attack on the blockchain.
  3. NFTs are secure. I have to set the knowledge that NFTs in itself is mostly very secure. Most of the high profiled incidents related to NFTs are more of internal authentication vulnerabilities and phishing related to poor security housekeeping and hygiene of the participants.
  4. NFTs represent authenticity. The digital certification of the NFTs as a data asset also define the ownership and the originality as well. The record of provenance is present and accounted for.

Since NFTs started as a technology to prove the assets and artifacts of the creative industry, there are already a few organizations that playing the role. Orygin Art is one that I found intriguing. Museums are also beginning to explore the potential of NFTs including validating and verifying the origins of many historical artifacts, and digitizing these physical assets to preserve its value forever.

The technology behind NFTs are not without its weaknesses as well but knowing what we know today, the potential is evident and power of the technology has yet to be explored fully. It does present a strong case in preserving the integrity of truthful data, and the data as historical artifacts.

Protect data safety and data integrity

Misinformation is damaging. Regardless if we believe the Butterfly Effect or not, misinformation can cause a ripple effect that could turn into a tidal wave. We need to uphold the sanctity of Truth, and continue to protect data safety and data integrity. The world is already damaged, and it will be damaged even more if we allow misinformation to permeate into the fabric of the global societies. We may welcome to a dystopian future, unfortunately.

This blog hopes to shake up the nonchalant state that we view “information” and “misinformation” today. There is a famous quote that said “Repeat a lie often enough and it becomes the truth“. We must lead the call to combat misinformation. What we do now will shape the generations of our present and future. Preserve Truth.

WaPo “Democracy Dies in Darkness”

[ Condolence: Japan Prime Minister, Shinzo Abe, was assassinated last week. News sources mentioned that the man who killed him had information that the slain PM has ties to a religious group that bankrupted his mother. Misinformation may played a role in the killing of the Japanese leader. ]

Is denying public cloud storage a storm in a political teacup or something more?

Ah, India! The country that gave us the Silicon Valley of Asia in Bengaluru, and exports over USD$150 billion software and IT services to the world.

Last week, the government of India banned the use of non-sanctioned public cloud storage such as Google® Drive and Dropbox®, plus the use of VPNs (virtual private networks). This is nothing new as China has banned foreign VPN services, Dropbox®, for years while Google® was adjusting its plans for China in 2020, with little hope to do more it is allowed to. I am not sure what the India’s alternatives are but China already has their own cloud storage services for a while now. So, what does this all mean?

India bans public cloud storage and VPN services

Public cloud storage services has been a boon for over a decade since Dropbox® entered the scene in 2008. BYOD (bring your own devices) became a constant in every IT person’s lips at that time. And with the teaser of 2GB or more, many still rely on these public cloud storage services with the ability to sync with tablets, smart phones and laptops. But the proliferation of these services also propagated many cybersecurity risks, and yes, ransomware can infect these public cloud storage. Even more noxious, the synchronization of files and folders of these services with on-premises storage devices makes it easy for infected data to spread, often with great efficacy.

Banning these widely available cloud storage applications is more than an inconvenience. Governments like China and India are shoring up their battlegrounds, as the battle for the protection and the privacy of sovereign data will not only escalate but also create a domino effect in the geopolitical dominance in the digital landscape.

We have already seen news that India is asserting its stance against China. First there was an app called “Remove China App” that came up in Google® Play Store in 2020. Also in 2020, the Ministry of Information Technology of India also banned 59 apps, mostly from China in order to protect the “sovereignty and integrity of India, defence of India, security of state and public order”.

This is not the war of 2 of the most populous nations of the world. Underneath these acts, there are more things to come, and it won’t just involve China and India. We will see other nations follow, with some already in the works to draw boundaries and demarcate digital borders in the name of data security, privacy, sovereignty and protection.

I hear of some foreign vendors lamenting about such a move. Most have already either complied with China’s laws or chose to exit that market. This recent move by India may feel like a storm in a teacup, but beneath it all, the undercurrent is getting stronger each day. A digital geopolitical tempest is percolating and brewing.

Stating the case for a Storage Appliance approach

I was in Indonesia last week to meet with iXsystems™‘ partner PT Maha Data Solusi. I had the wonderful opportunity to meet with many people there and one interesting and often-replayed question arose. Why aren’t iX doing software-defined-storage (SDS)? It was a very obvious and deliberate question.

After all, iX is already providing the free use of the open source TrueNAS® CORE software that runs on many x86 systems as an SDS solution and yet commercially, iX sell the TrueNAS® storage appliances.

This argument between a storage appliance model and a storage storage only model has been debated for more than a decade, and it does come into my conversations on and off. I finally want to address this here, with my own views and opinions. And I want to inform that I am open to both models, because as a storage consultant, both have their pros and cons, advantages and disadvantages. Up front I gravitate to the storage appliance model, and here’s why.

My story of the storage appliance begins …

Back in the 90s, most of my work was on Fibre Channel and NFS. iSCSI has not existed yet (iSCSI was ratified in 2003). It was almost exclusively on the Sun Microsystems® enterprise storage with Sun’s software resell of the Veritas® software suite that included the Sun Volume Manager (VxVM), Veritas® Filesystem (VxFS), Veritas® Replication (VxVR) and Veritas® Cluster Server (VCS). I didn’t do much Veritas® NetBackup (NBU) although I was trained at Veritas® in Boston in July 1997 (I remembered that 2 weeks’ trip fondly). It was just over 2 months after Veritas® acquired OpenVision. Backup Plus was the NetBackup.

Between 1998-1999, I spent a lot of time working Sun NFS servers. The prevalent networking speed at that time was 100Mbits/sec. And I remember having this argument with a Sun partner engineer by the name of Wong Teck Seng. Teck Seng was an inquisitive fella (still is) and he was raving about this purpose-built NFS server he knew about and he shared his experience with me. I detracted him, brushing aside his always-on tech orgasm, and did not find great things about a NAS storage appliance. Auspex™ was big then, and I knew of them.

I joined NetApp® as Malaysia’s employee #2. It was an odd few months working with a storage appliance but after a couple of months, I started to understand and appreciate the philosophy. The storage Appliance Model made sense to me, even through these days.

Continue reading

Object Storage becoming storage lingua franca of Edge-Core-Cloud

Data Fabric was a big buzzword going back several years. I wrote a piece talking about Data Fabric, mostly NetApp®’s,  almost 7 years ago, which I titled “The Transcendence of Data Fabric“. Regardless of storage brands and technology platforms, and each has its own version and interpretations, one thing holds true. There must be a one layer of Data Singularity. But this is easier said than done.

Fast forward to present. The latest buzzword is Edge-to-Core-Cloud or Cloud-to-Core-Edge. The proliferation of Cloud Computing services, has spawned beyond to multiclouds, superclouds and of course, to Edge Computing. Data is reaching to so many premises everywhere, and like water, data has found its way.

Edge-to-Core-to-Cloud (Gratitude thanks to https://www.techtalkthai.com/dell-technologies-opens-iot-solutions-division-and-introduces-distributed-core-architecture/)

The question on my mind is can we have a single storage platform to serve the Edge-to-Core-to-Cloud paradigm? Is there a storage technology which can be the seamless singularity of data? 7+ years onwards since my Data Fabric blog, The answer is obvious. Object Storage.

The ubiquitous object storage and the S3 access protocol

For a storage technology that was initially labeled “cheap and deep”, object storage has become immensely popular with developers, cloud storage providers and is fast becoming storage repositories for data connectors. I wrote a piece called “All the Sources and Sinks going to Object Storage” over a month back, which aptly articulate how far this technology has come.

But unknown to many (Google NASD and little is found), object storage started its presence in SNIA (it was developed in Carnegie-Mellon University prior to that) in the early 90s, then known as NASD (network attached secure disk). As it is made its way into the ANSI T10 INCITS standards development, it became known as Object-based Storage Device or OSD.

The introduction of object storage services 16+ years ago by Amazon Web Services (AWS) via their Simple Storage Services (S3) further strengthened the march of object storage, solidified its status as a top tier storage platform. It was to AWS’ genius to put the REST API over HTTP/HTTPS with its game changing approach to use CRUD (create, retrieve, update, delete) operations to work with object storage. Hence the S3 protocol, which has become the de facto access protocol to object storage.

Yes, I wrote those 2 blogs 11 and 9 years ago respectively because I saw that object storage technology was a natural fit to the burgeoning new world of storage computing. It has since come true many times over.

Continue reading

As Disk Drive capacity gets larger (and larger), the resilient Filesystem matters

I just got home from the wonderful iXsystems™ Sales Summit in Knoxville, Tennessee. The key highlight was to christian the opening of iXsystems™ Maryville facility, the key operations center that will house iX engineering, support and part of marketing as well. News of this can be found here.

iX datacenter in the new Maryville facility

Western Digital® has always been a big advocate of iX, and at the Summit, they shared their hard disk drives HDD, solid state drives SSD, and other storage platforms roadmaps. I felt like a kid a candy store because I love all these excitements in the disk drive industry. Who says HDDs are going to be usurped by SSDs?

Several other disk drive manufacturers, including Western Digital®, have announced larger capacity drives. Here are some news of each vendor in recent months

Other than the AFR (annualized failure rates) numbers published by Backblaze every quarter, the Capacity factor has always been a measurement of high interest in the storage industry.

Continue reading

Unstructured Data Observability with Datadobi StorageMAP

Let’s face it. Data is bursting through its storage seams. And every organization now is storing too much data that they don’t know they have.

By 2025, IDC predicts that 80% the world’s data will be unstructured. IDC‘s report Global Datasphere Forecast 2021-2025 will see the global data creation and replication capacity expand to 181 zettabytes, an unfathomable figure. Organizations are inundated. They struggle with data growth, with little understanding of what data they have, where the data is residing, what to do with the data, and how to manage the voluminous data deluge.

The simple knee-jerk action is to store it in cloud object storage where the price of storage is $0.0000xxx/GB/month. But many IT departments in these organizations often overlook the fact that that the data they have parked in the cloud require movement between the cloud and on-premises. I have been involved in numerous discussions where the customers realized that they moved the data in the cloud moved too frequently. Often it was an erred judgement or short term blindness (blinded by the cheap storage costs no doubt), further exacerbated by the pandemic. These oversights have resulted in expensive and painful monthly API calls and egress fees. Welcome to reality. Suddenly the cheap cloud storage doesn’t sound so cheap after all.

The same can said about storing non-active unstructured data on primary storage. Many organizations have not been disciplined to practise good data management. The primary Tier 1 storage becomes bloated over time, grinding sluggishly as the data capacity grows. I/O processing becomes painfully slow and backup takes longer and longer. Sounds familiar?

The A in ABC

I brought up the ABC mantra a few blogs ago. A is for Archive First. It is part of my data protection consulting practice conversation repertoire, and I use it often to advise IT organizations to be smart with their data management. Before archiving (some folks like to call it tiering, but I am not going down that argument today), we must know what to archive. We cannot blindly send all sorts of junk data to the secondary or tertiary storage premises. If we do that, it is akin to digging another hole to fill up the first hole.

We must know which unstructured data to move replicate or sync from the Tier 1 storage to a second (or third) less taxing storage premises. We must be able to see this data, observe its behaviour over time, and decide the best data management practice to apply to this data. Take note that I said best data management practice and not best storage location in the previous sentence. There has to be a clear distinction that a data management strategy is more prudent than to a “best” storage premises. The reason is many organizations are ignorantly thinking the best storage location (the thought of the “cheapest” always seems to creep up) is a good strategy while ignoring the fact that data is like water. It moves from premises to premises, from on-prem to cloud, cloud to other cloud. Data mobility is a variable in data management.

Continue reading