Beyond the WORM with MinIO object storage

I find the terminology of WORM (Write Once Read Many) coming back into the IT speak in recent years. In the era of rip and burn, WORM was a natural thing where many of us “youngsters” used to copy files to a blank CD or DVD. I got know about how WORM worked when I learned that the laser in the CD burning process alters the chemical compound in a segment on the plastic disc of the CD, rendering the “burned” segment unwritable once it was written but it could be read many times.

At the enterprise level, I got to know about WORM while working with tape drives and tape libraries in the mid-90s. The objective of WORM is to save and archive the data and files in a non-rewritable format for compliance reasons. And it was the data compliance and data protection parts that got me interested into data management. WORM is a big deal in many heavily regulated industries such as finance and banking, insurance, oil and gas, transportation and more.

Obviously things have changed. WORM, while very much alive in the ageless tape industry, has another up-and-coming medium in Object Storage. The new generation of data infrastructure and data management specialists are starting to take notice.

Worm Storage – Image from Hubstor (https://www.hubstor.net/blog/write-read-many-worm-compliant-storage/)

I take this opportunity to take MinIO object storage for a spin in creating WORM buckets which can be easily architected as data compliance repositories with many applications across regulated industries. Here are some relevant steps.

Continue reading

Truthful information under attack. The call for Data Preservation

The slogan of The Washington Post is “Democracy Dies in Darkness“. Although not everyone agrees with the US brand of democracy, the altruism of WaPo‘s (the publication’s informal name) slogan is a powerful one. The venerable newspaper remains the beacon in the US as one of the most trustworthy sources of truthful, honest information.

4 Horsemen of Apocalypse with the 5th joining

Misinformation

Misinformation has become a clear and present danger to humanity. Fake news, misleading information, lies are fueling and propelling the propaganda and agenda of the powerful (and the deranged). Facts are blurred, obfuscated, and even removed and replaced with misinformation to push for the undesirable effects that will affect the present and future generations.

The work of SNIA®

Data preservation is part of Data Management. More than a decade ago, SNIA® has already set up a technical work group (TWG) on Long Term Retention and proposed a format for long-term storage of digital format. It was called SIRF (Self-contained Information Retention Format). In the words of SNIA®, “The SIRF format enables long-term physical storage, cloud storage and tape-based containers effective and efficient ways to preserve and secure digital information for many decades, even with the ever-changing technology landscape.”

I don’t think battling misinformation was SNIA®’s original intent, but the requirements for a vendor-neutral organization as such to present and promote long term data preservation is more needed than ever. The need to protect the truth is paramount.

SNIA® continues to work with many organizations to create and grow the ecosystem for long term information retention and data preservation.

NFTs can save data

Despite the hullabaloo of NFTs (non-fungible tokens), which is very much soiled and discredited by the present day cryptocurrency speculations, I view data (and metadata) preservation as a strong use case for NFTs. The action is to digitalize data into an NFT asset.

Here are a few arguments:

  1. NFTs are unique. Once they are verified and inserted into the blockchain, they are immutable. They cannot be modified, and each blockchain transaction is created with one never to be replicated hashed value.
  2. NFTs are decentralized. Most of the NFTs we know of today are minted via a decentralized process. This means that the powerful cannot (most of the time), effect the NFTs state according to its whims and fancies. Unless the perpetrators know how to manipulate a Sybil attack on the blockchain.
  3. NFTs are secure. I have to set the knowledge that NFTs in itself is mostly very secure. Most of the high profiled incidents related to NFTs are more of internal authentication vulnerabilities and phishing related to poor security housekeeping and hygiene of the participants.
  4. NFTs represent authenticity. The digital certification of the NFTs as a data asset also define the ownership and the originality as well. The record of provenance is present and accounted for.

Since NFTs started as a technology to prove the assets and artifacts of the creative industry, there are already a few organizations that playing the role. Orygin Art is one that I found intriguing. Museums are also beginning to explore the potential of NFTs including validating and verifying the origins of many historical artifacts, and digitizing these physical assets to preserve its value forever.

The technology behind NFTs are not without its weaknesses as well but knowing what we know today, the potential is evident and power of the technology has yet to be explored fully. It does present a strong case in preserving the integrity of truthful data, and the data as historical artifacts.

Protect data safety and data integrity

Misinformation is damaging. Regardless if we believe the Butterfly Effect or not, misinformation can cause a ripple effect that could turn into a tidal wave. We need to uphold the sanctity of Truth, and continue to protect data safety and data integrity. The world is already damaged, and it will be damaged even more if we allow misinformation to permeate into the fabric of the global societies. We may welcome to a dystopian future, unfortunately.

This blog hopes to shake up the nonchalant state that we view “information” and “misinformation” today. There is a famous quote that said “Repeat a lie often enough and it becomes the truth“. We must lead the call to combat misinformation. What we do now will shape the generations of our present and future. Preserve Truth.

WaPo “Democracy Dies in Darkness”

[ Condolence: Japan Prime Minister, Shinzo Abe, was assassinated last week. News sources mentioned that the man who killed him had information that the slain PM has ties to a religious group that bankrupted his mother. Misinformation may played a role in the killing of the Japanese leader. ]

Object Storage becoming storage lingua franca of Edge-Core-Cloud

Data Fabric was a big buzzword going back several years. I wrote a piece talking about Data Fabric, mostly NetApp®’s,  almost 7 years ago, which I titled “The Transcendence of Data Fabric“. Regardless of storage brands and technology platforms, and each has its own version and interpretations, one thing holds true. There must be a one layer of Data Singularity. But this is easier said than done.

Fast forward to present. The latest buzzword is Edge-to-Core-Cloud or Cloud-to-Core-Edge. The proliferation of Cloud Computing services, has spawned beyond to multiclouds, superclouds and of course, to Edge Computing. Data is reaching to so many premises everywhere, and like water, data has found its way.

Edge-to-Core-to-Cloud (Gratitude thanks to https://www.techtalkthai.com/dell-technologies-opens-iot-solutions-division-and-introduces-distributed-core-architecture/)

The question on my mind is can we have a single storage platform to serve the Edge-to-Core-to-Cloud paradigm? Is there a storage technology which can be the seamless singularity of data? 7+ years onwards since my Data Fabric blog, The answer is obvious. Object Storage.

The ubiquitous object storage and the S3 access protocol

For a storage technology that was initially labeled “cheap and deep”, object storage has become immensely popular with developers, cloud storage providers and is fast becoming storage repositories for data connectors. I wrote a piece called “All the Sources and Sinks going to Object Storage” over a month back, which aptly articulate how far this technology has come.

But unknown to many (Google NASD and little is found), object storage started its presence in SNIA (it was developed in Carnegie-Mellon University prior to that) in the early 90s, then known as NASD (network attached secure disk). As it is made its way into the ANSI T10 INCITS standards development, it became known as Object-based Storage Device or OSD.

The introduction of object storage services 16+ years ago by Amazon Web Services (AWS) via their Simple Storage Services (S3) further strengthened the march of object storage, solidified its status as a top tier storage platform. It was to AWS’ genius to put the REST API over HTTP/HTTPS with its game changing approach to use CRUD (create, retrieve, update, delete) operations to work with object storage. Hence the S3 protocol, which has become the de facto access protocol to object storage.

Yes, I wrote those 2 blogs 11 and 9 years ago respectively because I saw that object storage technology was a natural fit to the burgeoning new world of storage computing. It has since come true many times over.

Continue reading

Backup – Lest we forget

World Backup Day – March 31st

Last week was World Backup Day. It is on March 31st every year so that you don’t lose your data and become an April’s Fool the next day.

Amidst the growing awareness of the importance of backup, no thanks to the ever growing destructive nature of ransomware, it is important to look into other aspects of data protection – both a data backup/recovery and a data security –  point of view as well.

3-2-1 Rule, A-B-C and Air Gaps

I highlighted the basic 3-2-1 rule before. This must always be paired with a set of practised processes and policies to cultivate all stakeholders (aka the people) in the organization to understand the importance of protecting the data and ensuring data recoverability.

The A-B-C is to look at the production dataset and decide if the data should be stored in the Tier 1 storage. In most cases, the data becomes less active and these datasets may be good candidates to be archived. Once archived, the production dataset is smaller and data backup operations become lighter, faster and have positive causation as well.

Air gaps have returned to prominence since the heightened threats on data in recent years. The threats have pushed organizations to consider doing data offsite and offline with air gaps. Cost considerations and speed of recovery can be of concerns, and logical air gaps are also gaining style as an acceptable extra layer of data. protection.

Backup is not total Data Protection cyberdefence

If we view data protection more holistically and comprehensively, backup (and recovery) is not the total data protection solution. We must ignore the fancy rhetorics of the technology marketers that backup is the solution to ensure data protection because there is much more than that.

The well respected NIST (National Institute of Standards and Technology) Cybersecurity Framework places Recovery (along with backup) as the last pillar of its framework.

NIST Cybersecurity Framework

Continue reading

Nakivo Backup Replication architecture and installation on TrueNAS – Part 1

Backup and Replication software have received strong mandates in organizations with enterprise mindsets and vision. But lower down the rung, small medium organizations are less invested in backup and replication software. These organizations know full well that they must backup, replicate and protect their servers, physical and virtual, and also new workloads in the clouds, given the threat of security breaches and ransomware is looming larger and larger all the time. But many are often put off by the cost of implementing and deploying a Backup and Replication software.

So I explored one of the lesser known backup and recovery software called Nakivo® Backup and Replication (NBR) and took the opportunity to build a backup and replication appliance in my homelab with TrueNAS®. My objective was to create a cost effective option for small medium organizations to enjoy enterprise-grade protection and recovery without the hefty price tag.

This blog, Part 1, writes about the architecture overview of Nakivo® and the installation of the NBR software in TrueNAS® to bake in and create the concept of a backup and replication appliance. Part 2, in a future blog post, will cover the administrative and operations usage of NBR.

Continue reading

Storage Elephant Compute Birds

Data movement is expensive. Not just costs, but also latency and resources as well. Thus there were many narratives to move compute closer to where the data is stored because moving compute is definitely more economical than moving data. I borrowed the analogy of the 2 animals from some old NetApp® slides which depicted storage as the elephant, and compute as birds. It was the perfect analogy, because the storage is heavy and compute is light.

“Close up of a white Great Egret perching on top of an African Elephant aa Amboseli national park, Kenya”

Before the animals representation came about I used to use the term “Data locality, Data Mobility“, because of past work on storage technology in the Oil & Gas subsurface data management pipeline.

Take stock of your data movement

I had recent conversations with an end user who has been paying a lot of dollars keeping their “backup” and “archive” in AWS Glacier. The S3 storage is cheap enough to hold several petabytes of data for years, because the IT folks said that the data in AWS Glacier are for “backup” and “archive”. I put both words in quotes because they were termed as “backup” and “archive” because of their enterprise practice. However, the face of their business is changing. They are in manufacturing, oil and gas downstream, and the definitions of “backup” and “archive” data has changed.

For one, there is a strong demand for reusing the past data for various reasons and these datasets have to be recalled from their cloud storage. Secondly, their data movement activities still mimicked what they did in the past during their enterprise storage days. It was a classic lift-and-shift when they moved to the cloud, and not taking stock of  their data movements and the operations they ran on these datasets. Still ongoing, their monthly AWS cost a bomb.

Continue reading

What happened to NDMP?

The acronym NDMP shows up once in a while in NAS (Network Attached Storage) upgrade tenders. And for the less informed, NDMP (Network Data Management Protocol) was one of the early NAS data management (more like data mover specifications) initiatives to backup NAS devices, especially the NAS appliances that run proprietary operating systems code.

NDMP Logo

Backup software vendors often have agents developed specifically for an operating system or an operating environment. But back in the mid-1990s, 2000s, the internal file structures of these proprietary vendors were less exposed, making it harder for backup vendors to develop agents for them. Furthermore, there was a need to simplify the data movements of NAS files between backup servers and the NAS as a client, to the media servers and eventually to the tape or disk targets. The dominant network at the time ran at 100Mbits/sec.

To overcome this, Network Appliance® and PDC Solutions/Legato® developed the NDMP protocol, allowing proprietary NAS devices to run a standardized client-server architecture with the NDMP server daemon in the NAS and the backup service running as an NDMP client. Here is a simplified look at the NDMP architecture.

NDMP Client-Server Architecture

Continue reading

SSOT of Files

[ This is part two of “Where are your files living now?”. You can read Part One here ]

Data locality, Data mobility“. It was a term I like to use a lot when describing about data consolidation, leading to my mention about files and folders, and where they live in my previous blog. The thinking of where the files and folders are now as in everywhere as they can be in a plethora of premises stretches the premise of SSOT (Single Source of Truth). And this expatriation of files with minimal checks and balances disturbs me.

A year ago, just before I joined iXsystems, I was given Google® embargoed news, probably a week before they announced BigQuery Omni. Then I was interviewed by Enterprise IT News, a local Malaysian technology news portal to provide an opinion quote. This was what I quoted:

“’The data warehouse in the cloud’ managed services of Big Query is underpinned by Google® Anthos, its hybrid cloud infra and service management platform based on GKE (Google® Kubernetes Engine). The containerised applications, both on-prem and in the multi-clouds, would allow Anthos to secure and orchestrate infra, services and policy management under one roof.”

I further quoted ” The data repositories remain in each cloud is good to address data sovereignty, data security concerns but it did not mention how it addresses “single source of truth” across multi-clouds.

Single Source of Truth – regardless of repositories

Continue reading

Resilient Integrated Data Protection against Ransomware

Early in the year, I wrote about NAS systems being a high impact target for ransomware. I called NAS a goldmine for ransomware. This is still very true because NAS systems are the workhorses of many organizations. They serve files and folders and from it, the sharing and collaboration of Work.

Another common function for NAS systems is being a target for backups. In small medium organizations, backup software often direct their backups to a network drive in the network. Even for larger enterprise customers too, NAS is the common destination for backups.

Backup to NAS system

Typical NAS backup for small medium organizations.

Backup to Data Domain with NAS Protocols

Backup to Data Domain with NAS (NFS, CIFS) Protocols

Ransomware is obviously targeting the backup as another high impact target, with the potential to disrupt the rescue and the restoration of the work files and folders.

Continue reading