Understanding security practices in File Synchronization

Ho hum. Another day, and another data leak. What else is new?

The latest hullabaloo in my radar was from one of Malaysia’s reverent universities, UiTM, which reported a data leak of 11,891 student applicants’ private details including MyKad (national identity card) numbers of each individual. Reading from the news article, one can deduced that the unsecured link mentioned was probably from a cloud storage service, i.e. file synchronization software such as OneDrive, Google Drive, Dropbox, etc. Those files that can be easily shared via an HTTP/S URL link. Ah, convenience over the data security best practices. 

Cloud File Sync software

It irks me when data security practices are poorly practised. And it is likely that there is ignorance of data security practices in the first place.

It also irks me when many end users everywhere I have encountered tell me their file synchronization software is backup. That is just a very poor excuse of a data protection strategy, if any, especially in enterprise and cloud environments. Convenience, set-and-forget mentality. Out of sight. Out of mind. Right? 

Convenience is not data security. File Sync is NOT Backup

Many users are used to the convenience of file synchronization. The proliferation of cloud storage services with free Gigabytes here and there have created an IT segment based on BYOD, which transformed into EFSS, and now CCP. The buzzword salad involves the Bring-Your-Own-Device, which evolved into Enterprise-File-Sync-&-Share, and in these later years, Content-Collaboration-Platform.

All these are fine and good. The data industry is growing up, and many are leveraging the power of file synchronization technologies, be it on on-premises and from cloud storage services. Organizations, large and small, are able to use these file synchronization platforms to enhance their businesses and digitally transforming their operational efficiencies and practices. But what is sorely missing in embracing the convenience and simplicity is the much ignored cybersecurity housekeeping practices that should be keeping our files and data safe.

Continue reading

Project COSI

The S3 (Simple Storage Service) has become a de facto standard for accessing object storage. Many vendors claim 100% compatibility to S3, but from what I know, several file storage services integration and validation with the S3 have revealed otherwise. There are certain nuances that have derailed some of the more advanced integrations. I shall not reveal the ones that I know of, but let us use this thought as a basis of our discussion for Project COSI in this blog.

Project COSI high level architecture

What is Project COSI?

COSI stands for Container Object Storage Interface. It is still an alpha stage project in Kubernetes version 1.25 as of September 2022 whilst the latest version of Kubernetes today is version 1.26. To understand the objectives COSI, one must understand the journey and the challenges of persistent storage for containers and Kubernetes.

For me at least, there have been arduous arguments of provisioning a storage repository that keeps the data persistent (and permanent) after containers in a Kubernetes pod have stopped, or replicated to another cluster. And for now, many storage vendors in the industry have settled with the CSI (container storage interface) framework when it comes to data persistence using file-based and block-based storage. You can find a long list of CSI drivers here.

However, you would think that since object storage is the most native storage to containers and Kubernetes pods, there is already a consistent way to accessing object storage services. From the objectives set out by Project COSI, turns out that there isn’t a standard way to provision and accessing object storage as compared to the CSI framework for file-based and block-based storage. So the COSI objectives were set to:

  • Kubernetes Native – Use the Kubernetes API to provision, configure and manage buckets
  • Self Service – A clear delineation between administration and operations (DevOps) to enable self-service capability for DevOps personnel
  • Portability – Vendor neutrality enabled through portability across Kubernetes Clusters and across Object Storage vendors

Further details describing Project COSI can be found here at the Kubernetes site titled “Introducing COSI: Object Storage Management using Kubernetes API“.

Standardization equals technology adoption

Standardization means consistency, control, confidence. The higher the standardization across the storage and containerized apps industry, the higher the adoption of the technology. And given what I have heard from the industry over these few years, Kubernetes, to me, even till this day, is a platform and a framework that are filled and riddled with so many moving parts. Many of the components looks the same, feels the same, and sounds the same, but might not work out the same when deployed.

Therefore, the COSI standardization work is important and critical to grow this burgeoning segment, especially when we are rocketing towards disaggregation of computing service units, resources that be orchestrated to scale up or down at the execution of codes. Infrastructure-as-Code (IAC) is becoming a reality more and more with each passing day, and object storage is at the heart of this transformation for Kubernetes and containers.

Continue reading

Open Source on my mind

Last week was cropped with topics around Open Source software. I want to voice my opinions here (with a bit of ranting) and hoping not to rouse many abhorrent comments from different parties and views. This blog is to create conversations, even controversial ones, but we must first agree that there will be disagreements. We must accept disagreements as part of this conversation.

In my 30 years career, Open Source has been a big part of my development and progress. The ideas of freely using (certain) software without any licensing implications and these software being openly available were not always welcomed, as they are now. I think the Open Source revolution has created an innovation movement that is still going strong, and it has not only permeated completely into the IT industry, Open Source has also now in almost every part of the technology-based industries as well. The Open Source influence is massive.

Open Source word cloud

In the beginning

In the beginning, in my beginning in 1992, the availability of software and its source codes was a closed one. Coming from a VAX/VMS background (I was a system admin in my mathematics department’s mini computers), Unix liberated my thinking. The final 6 months in the university was systems programming in C, and it completely changed how I wanted my career to shape. The mantra of “Free as in Freedom” in General Public License GPL (which I got know of much later) boded well with my own tenets in life.

If closed source development models led to proprietary software and a centralized way to distributing software with license, I would count the Open Source development models as one of the earliest decentralized technology frameworks. Down with the capitalistic corporations (aka Evil Empires)!

It was certainly a wonderful and generous way to make the world that it is today. It is a better world now.

Continue reading

Beyond the WORM with MinIO object storage

I find the terminology of WORM (Write Once Read Many) coming back into the IT speak in recent years. In the era of rip and burn, WORM was a natural thing where many of us “youngsters” used to copy files to a blank CD or DVD. I got know about how WORM worked when I learned that the laser in the CD burning process alters the chemical compound in a segment on the plastic disc of the CD, rendering the “burned” segment unwritable once it was written but it could be read many times.

At the enterprise level, I got to know about WORM while working with tape drives and tape libraries in the mid-90s. The objective of WORM is to save and archive the data and files in a non-rewritable format for compliance reasons. And it was the data compliance and data protection parts that got me interested into data management. WORM is a big deal in many heavily regulated industries such as finance and banking, insurance, oil and gas, transportation and more.

Obviously things have changed. WORM, while very much alive in the ageless tape industry, has another up-and-coming medium in Object Storage. The new generation of data infrastructure and data management specialists are starting to take notice.

Worm Storage – Image from Hubstor (https://www.hubstor.net/blog/write-read-many-worm-compliant-storage/)

I take this opportunity to take MinIO object storage for a spin in creating WORM buckets which can be easily architected as data compliance repositories with many applications across regulated industries. Here are some relevant steps.

Continue reading

Object Storage becoming storage lingua franca of Edge-Core-Cloud

Data Fabric was a big buzzword going back several years. I wrote a piece talking about Data Fabric, mostly NetApp®’s,  almost 7 years ago, which I titled “The Transcendence of Data Fabric“. Regardless of storage brands and technology platforms, and each has its own version and interpretations, one thing holds true. There must be a one layer of Data Singularity. But this is easier said than done.

Fast forward to present. The latest buzzword is Edge-to-Core-Cloud or Cloud-to-Core-Edge. The proliferation of Cloud Computing services, has spawned beyond to multiclouds, superclouds and of course, to Edge Computing. Data is reaching to so many premises everywhere, and like water, data has found its way.

Edge-to-Core-to-Cloud (Gratitude thanks to https://www.techtalkthai.com/dell-technologies-opens-iot-solutions-division-and-introduces-distributed-core-architecture/)

The question on my mind is can we have a single storage platform to serve the Edge-to-Core-to-Cloud paradigm? Is there a storage technology which can be the seamless singularity of data? 7+ years onwards since my Data Fabric blog, The answer is obvious. Object Storage.

The ubiquitous object storage and the S3 access protocol

For a storage technology that was initially labeled “cheap and deep”, object storage has become immensely popular with developers, cloud storage providers and is fast becoming storage repositories for data connectors. I wrote a piece called “All the Sources and Sinks going to Object Storage” over a month back, which aptly articulate how far this technology has come.

But unknown to many (Google NASD and little is found), object storage started its presence in SNIA (it was developed in Carnegie-Mellon University prior to that) in the early 90s, then known as NASD (network attached secure disk). As it is made its way into the ANSI T10 INCITS standards development, it became known as Object-based Storage Device or OSD.

The introduction of object storage services 16+ years ago by Amazon Web Services (AWS) via their Simple Storage Services (S3) further strengthened the march of object storage, solidified its status as a top tier storage platform. It was to AWS’ genius to put the REST API over HTTP/HTTPS with its game changing approach to use CRUD (create, retrieve, update, delete) operations to work with object storage. Hence the S3 protocol, which has become the de facto access protocol to object storage.

Yes, I wrote those 2 blogs 11 and 9 years ago respectively because I saw that object storage technology was a natural fit to the burgeoning new world of storage computing. It has since come true many times over.

Continue reading

Unstructured Data Observability with Datadobi StorageMAP

Let’s face it. Data is bursting through its storage seams. And every organization now is storing too much data that they don’t know they have.

By 2025, IDC predicts that 80% the world’s data will be unstructured. IDC‘s report Global Datasphere Forecast 2021-2025 will see the global data creation and replication capacity expand to 181 zettabytes, an unfathomable figure. Organizations are inundated. They struggle with data growth, with little understanding of what data they have, where the data is residing, what to do with the data, and how to manage the voluminous data deluge.

The simple knee-jerk action is to store it in cloud object storage where the price of storage is $0.0000xxx/GB/month. But many IT departments in these organizations often overlook the fact that that the data they have parked in the cloud require movement between the cloud and on-premises. I have been involved in numerous discussions where the customers realized that they moved the data in the cloud moved too frequently. Often it was an erred judgement or short term blindness (blinded by the cheap storage costs no doubt), further exacerbated by the pandemic. These oversights have resulted in expensive and painful monthly API calls and egress fees. Welcome to reality. Suddenly the cheap cloud storage doesn’t sound so cheap after all.

The same can said about storing non-active unstructured data on primary storage. Many organizations have not been disciplined to practise good data management. The primary Tier 1 storage becomes bloated over time, grinding sluggishly as the data capacity grows. I/O processing becomes painfully slow and backup takes longer and longer. Sounds familiar?

The A in ABC

I brought up the ABC mantra a few blogs ago. A is for Archive First. It is part of my data protection consulting practice conversation repertoire, and I use it often to advise IT organizations to be smart with their data management. Before archiving (some folks like to call it tiering, but I am not going down that argument today), we must know what to archive. We cannot blindly send all sorts of junk data to the secondary or tertiary storage premises. If we do that, it is akin to digging another hole to fill up the first hole.

We must know which unstructured data to move replicate or sync from the Tier 1 storage to a second (or third) less taxing storage premises. We must be able to see this data, observe its behaviour over time, and decide the best data management practice to apply to this data. Take note that I said best data management practice and not best storage location in the previous sentence. There has to be a clear distinction that a data management strategy is more prudent than to a “best” storage premises. The reason is many organizations are ignorantly thinking the best storage location (the thought of the “cheapest” always seems to creep up) is a good strategy while ignoring the fact that data is like water. It moves from premises to premises, from on-prem to cloud, cloud to other cloud. Data mobility is a variable in data management.

Continue reading

Ridding consumer storage mindset for Enterprise operations

I cut my teeth in Enterprise Storage for 3 decades. On and off, I get the opportunity to work on Cloud Storage as well, mostly more structured storage infrastructure services such as blocks and files, in cloud offerings on AWS, Azure and Alibaba Cloud. I am familiar with S3 operations (mostly the CRUD operations and HTTP headers stuff) too, although I have yet to go deep with S3 with Restful API. And I really wanted to work on stuff with the S3 Select when the opportunity arises. (Note: Homelab project to-do list)

Along with the experience is the enterprise mindset of designing and crafting storage infrastructure and data management practices that evolve around data. Understanding the characteristics of data and the behaviours data in motion is part of my skills repertoire, and I continue to have conversations with organizations, small and large alike every day of the week.

This week’s blog was triggered by an article by Tech Republic® Jack Wallen‘s interview with Fedora project leader Matthew Miller. I have been craning my neck waiting for the full release of Fedora 36 (which now has been pushed to May 10th 2022), and the Tech Republic®’s article, “The future of Linux: Fedora project leader weighs in” touched me. Let me set the context of my expanded commentaries here.

History of my open source experience- bringing Enterprise to the individual

I have been working with open source software for a long time. My first Linux experience was Soft Landing Linux in the early 90s. It was a bunch of diskettes I purchased online while dabbling with FreeBSD® on the sides. Even though my day job was on the SunOS, and later Solaris®, having the opportunity to build stuff and learn the enterprise ways with Sun Microsystems® hardware and software were difficult at my homelab. I did bring home a SPARCstation® 2 once but the CRT monitor almost broke my computer table at that time.

Having open source software on 386i (before x86) architecture was great (no matter how buggy they were) because I got to learn hardcore enterprise technology at home. I am a command line person, so the desktop experience does not bother me much because my OS foundation is there. Open source gave me a world I could master my skills as an individual. For an individual like me, my mindset is always on the Enterprise.

The Tech Republic interview and my reflections

I know the journey open source OSes has taken at the server (aka Enterprise) level. They are great, and are getting better and better. But at the desktop (aka consumer) level, the Linux desktop experience has been an arduous one even though the open source Linux desktop experience is so much better now. This interview reflected on that.

There were a few significant points that were brought up. Those poignant moments explained about the free software in open source projects, how consumers glazed over (if I get what Matt Miller meant) the cosmetics of the open source software without the deeper meaningful objectives of the software had me feeling empty. Many assumed that just because the software is open source, it should be free or of low costs and continue to apply a consumer mindset to the delivery and the capability of the software.

Case in point is the way I have been seeing many TrueNAS®/FreeNAS™ individuals who downloaded the free software and using them in consumer ways. That is perfectly fine but when they want to migrate their consumer experience with the TrueNAS® software to their critical business operations, things suddenly do not look so rosy anymore. From my experience, having built enterprise-grade storage solutions with open source software like ZFS on OpenSolaris/OpenIndiana, FreeNAS™ and TrueNAS® for over a decade plus gaining plenty of experience on many proprietary and software-defined storage platforms along this 30 year career, the consumer mindsets do not work well in enterprise missions.

And over the years, I have been seeing this newer generation of infrastructure people taking less and less interest in learning the enterprise ways or going deep dive into the workings of the open source platforms I have mentioned. Yet, they have lofty enterprise expectations while carrying a consumer mindset. More and more, I am seeing a greying crew of storage practitioners with enterprise experiences dealing with a new generation of organizations and end users with consumer practices and mindsets.

Open Source Word Cloud

Continue reading

All the Sources and Sinks going to Object Storage

The vocabulary of sources and sinks are beginning to appear in the world of data storage as we witness the new addition of data processing frameworks and the applications in this space. I wrote about this in my blog “Rethinking data. processing frameworks systems in real time” a few months ago, introducing my take on this budding new set of I/O characteristics and data ecosystem. I also started learning about the Kappa Architecture (and Lambda as well), a framework designed to craft and develop a set of amalgamated technologies to handle stream processing of a series of data in relation to time.

In Computer Science, sources and sinks are considered external entities that often serve as connectors of input and output of disparate systems. They are often not in the purview of data storage architects. Also often, these sources and sinks are viewed as black boxes, and their inner workings are hidden from the views of the data storage architects.

Diagram from https://developer.here.com/documentation/get-started/dev_guide/shared_content/topics/olp/concepts/pipelines.html

The changing facade of data stream processing presents the constant motion of data, the continuous data being altered as it passes through the many integrated sources and sinks. We are also see much of the data processed in-memory as much as possible. Thus, the data services from a traditional storage model of SAN and NAS may straggle with the requirements demanded by this new generation of data stream processing.

As the world of traditional data storage processing is expanding into data streams processing and vice versa, and the chatter of sources and sinks can no longer be ignored.

Continue reading

The young report card on Decentralized Storage

I kept this blog in my queue for over 4 months. I was reluctant to publish it because I thought the outrageous frenzies of NFTs (non-fungible tokens), metaverses and web3 were convoluting the discussions on the decentralized storage topic. 3 weeks back, a Google Trends search for these 3 opaque terms over 90 days showed that the worldwide fads were waning. Here was the Google Trends output on April 2, 2022:

Google Trends on NFT, metaverse and web3

Decentralized storage intrigues me. I like to believe in its potential and I often try to talk to people to strengthen the narratives, and support its adoption where it fits. But often, the real objectives of decentralized storage are obfuscated by the polarized conversations about cryptocurrencies that are pegged to their offerings, NFTs (non-fungible tokens), DAOs (decentralized autonomous organizations) and plenty of hyperboles with bewildering facts as well.

But I continue to seek sustainable conversations about decentralized storage without the sway of the NFTs or the cryptos. After dipping in my toes and experiencing with HODLers, and looking at the return to sanity, I believe we can discuss decentralized storage with better clarity now. The context is to position decentralized storage to the mainstream, specifically to business organizations already immersed in centralized storage. Here is my fledgling report card on decentralized storage.

Continue reading