The S3 (Simple Storage Service) has become a de facto standard for accessing object storage. Many vendors claim 100% compatibility to S3, but from what I know, several file storage services integration and validation with the S3 have revealed otherwise. There are certain nuances that have derailed some of the more advanced integrations. I shall not reveal the ones that I know of, but let us use this thought as a basis of our discussion for Project COSI in this blog.
What is Project COSI?
COSI stands for Container Object Storage Interface. It is still an alpha stage project in Kubernetes version 1.25 as of September 2022 whilst the latest version of Kubernetes today is version 1.26. To understand the objectives COSI, one must understand the journey and the challenges of persistent storage for containers and Kubernetes.
For me at least, there have been arduous arguments of provisioning a storage repository that keeps the data persistent (and permanent) after containers in a Kubernetes pod have stopped, or replicated to another cluster. And for now, many storage vendors in the industry have settled with the CSI (container storage interface) framework when it comes to data persistence using file-based and block-based storage. You can find a long list of CSI drivers here.
However, you would think that since object storage is the most native storage to containers and Kubernetes pods, there is already a consistent way to accessing object storage services. From the objectives set out by Project COSI, turns out that there isn’t a standard way to provision and accessing object storage as compared to the CSI framework for file-based and block-based storage. So the COSI objectives were set to:
- Kubernetes Native – Use the Kubernetes API to provision, configure and manage buckets
- Self Service – A clear delineation between administration and operations (DevOps) to enable self-service capability for DevOps personnel
- Portability – Vendor neutrality enabled through portability across Kubernetes Clusters and across Object Storage vendors
Further details describing Project COSI can be found here at the Kubernetes site titled “Introducing COSI: Object Storage Management using Kubernetes API“.
Standardization equals technology adoption
Standardization means consistency, control, confidence. The higher the standardization across the storage and containerized apps industry, the higher the adoption of the technology. And given what I have heard from the industry over these few years, Kubernetes, to me, even till this day, is a platform and a framework that are filled and riddled with so many moving parts. Many of the components looks the same, feels the same, and sounds the same, but might not work out the same when deployed.
Therefore, the COSI standardization work is important and critical to grow this burgeoning segment, especially when we are rocketing towards disaggregation of computing service units, resources that be orchestrated to scale up or down at the execution of codes. Infrastructure-as-Code (IAC) is becoming a reality more and more with each passing day, and object storage is at the heart of this transformation for Kubernetes and containers.
The big names are in (or not)
Looking at the Project COSI, there are already several big vendor companies staking their claim to develop the COSI drivers. Names like IBM®, Red Hat®, Microsoft®, MinIO are leading the list of contributors, but the glaring missing piece is the Big Kahuna, AWS™. Other cloud object storage vendors such as Backblaze, CloudFlare, Wasabi Technologies®, Google® Cloud are missing from the early bunch. So are other notable on-premises object storage vendors such as Cloudian®, Hitachi Vantara®, Scality (Although I found early development from 2 years ago by Scality on Github, I am not knowledgeable about Scality’s involvement in the project at present) and VMware® (also early involvement but unaware of present contribution).
The project is still in its alpha stage. It will take time for it to grow legs and hopefully the vendors I mentioned would stake their claim and contribute to the project for a widespread adoption.
Deep(er) Dive into COSI Architecture
I cannot resist going deeper dive into storage architecture. It is my nature to learn and share. COSI is still very, very new to me, but I am very much aware of its announcement 2 years ago. This is an opportunity to learn deep(er).
Unlike CSI for file-based and block-based storage, the concept of “mounting” or “attaching” the storage repository is alien to object storage. Object storage are accessed over the network with HTTP/S protocol without the mounting bit, and has more refined parameters that Kubernetes and containers can take advantage of. Thus, a new set of Kubernetes APIs are being developed to take advantage of these capabilities, along with standardized ways for storage vendors to develop object storage access using gRPC (google Remote Procedure Call) methods. More about the COSI specs are listed here.
The focus on COSI is to develop common methodologies to work with the granularity at the bucket level. Different terminologies are defined such as buckets, bucket access (BA), bucket request (BR), bucket class (BC) in order to stamp in the consistency of the bucket structure. The relationship of these structures are shown in the diagram below:
More information about bucket relationships can be found here.
At the object storage vendor side, a design model with a set of methodologies for user and application interfacing front end, and the object storage definition for provisioning and management at the backend has been put forward. A topology diagram was developed linking the COSI controller to the bucket framework described earlier is depicted in the diagram below:
As the conversation of superclouds or metaclouds take shape, the importance of a standardized architecture for the provisioning and management of object storage services cannot be understated. Kubernetes is the leading technology to making superclouds possible. Movement of workloads without friction will define the superclouds of tomorrow and it is important to see why. COSI is all about scaleability and portability to create frictionless relationships working with multiclouds.
We talk about workload resources within the present day cloud computing. With the COSI standardization efforts, and the distributed nature of object storage services, you can see how object storage could easily address the challenges of traditional file-based and block-based storage. This could be on its way to make superclouds a reality.