To win in the multi-cloud game, you have to be in your competitors’ cloud. Google Cloud has been doing that since they announced Google Anthos just over a year ago. They have been crafting their “assault”, starting with on-premises, and Anthos on AWS. Anthos on Microsoft® Azure is coming, currently in preview mode.
BigQuery Omni conversation starter
2 weeks ago, whilst the Google Cloud BigQuery Omni announcement was still under wraps, local Malaysian IT portal Enterprise IT News sent me the embargoed article to seek my views and opinions. I have to admit that I was ignorant about the deeper workings of BigQuery, and haven’t fully gone through the works of Google Anthos as well. So I researched them.
Having done some small works on Qubida (defunct) and Talend several years ago, I have grasped useful data analytics and data enablement concepts, and so BigQuery fitted into my understanding of BigQuery Omni quite well. That triggered my interests to write this blog and meshing the persistent storage conundrum (at least for me it is something to be untangled) to Kubernetes, to GKE (Google Kubernetes Engine), and thus Anthos as well.
For discussion sake, here is an overview of BigQuery Omni.
My comments and views are in this EITN article “Google Cloud’s BigQuery Omni for Multi-cloud Analytics”.
GKE is key
BigQuery Omni is a game changer. It is Google Cloud big bet to win the multi-cloud, hybrid cloud realms over the next few years, helmed by their CEO, Thomas Kurian. As Kubernetes and containers take shape and form across on-premises and multi-clouds, the GKE (Google Kubernetes Engine) is the core foundation. It underpins the Control Plane, the Service Mesh (powered by Istio), the Container Mobility (powered by Velostrata) and more. The Looker component adds to the data exploration and visualization of BigQuery Omni, and even the possibility to the telemetry data of the Anthos Service Mesh soon.
There are a lot of moving parts and a lot of take in. I have not fully digested Anthos yet but the opening moves of this fantastical chess play are impressive. For the laymen, the diagram below shares the GKE Control Plane lording over the key cloud providers and platforms. That is why it is a game changer.
Distributed Persistent Storage
Maybe I am just overthinking about the whole CSI (Container Storage Interface) and Persistent Storage thingy. After a few iterations where I did a bit of deep diving for research, the industry and the storage technology vendors have more or less settled with CSI as the de facto framework for provisioning persistent storage to Kubernetes clusters. Have a read of the 2 articles below to form your opinion and perception of persistent storage and Kubernetes.
- CNCF: A Complete Storage Guide for your Kubernetes Storage Problems
- Anthos: Kubernetes Infrastructure to make developers more productive
Despite all the hype and fanfare, Kubernetes at scale is hard. Persistent storage via CSI drivers is providing basic storage and data management capabilities but the lack of a distributed framework for persistent storage could make it difficult for Anthos (or any other similar control plane services) to move Kubernetes clusters and the workloads about, with the persistent storage they are linked to, to another cloud, across multi-clouds. I don’t think persistent storage on one platform can magically beamed across Cloud A to Cloud B at just a snap of a finger, and be ready for the Kubernetes cluster workloads (yet) just like that.
The “association-disassociation” of persistent data as pods and clusters are spinned up and spinned down, the persistent state of the storage distributed from one cloud platform to another, so far in my present knowledge and experience of Kubernetes and persistent storage, is not well formed. Lesser implementation such as asynchronous replication of Kubernetes volumes is there, but not widespread and has not fully matured. So far, from the many storage vendors I have researched, my question about about distributed persistent storage has not been answered to fully relieve the itch. That itching question for me is still there, even after I posted it 2 years ago.
Anthos Ready Storage: Initial motley crew of 6.
Google Cloud has a partner initiative called Anthos Ready Storage. The initial 6 vendors qualified are:
Mentioned in this Newstack.io article, the Anthos Ready Storage CSI drivers will enable seamless integration with the 6 storage vendors [sic].
Besides the display of confidence of technically working well with these storage vendors (and vice versa), the Anthos Ready Storage initiative also wants to remove the uneasiness for organizations putting their enterprise and cloud native workloads on GKE and Anthos. The complexity and difficulty of dealing with persistent storage and Kubernetes are masked within the initiative, and having a managed persistent storage framework will speed the adoption of Anthos on-premises and multi-clouds.
Listed in the Newstack.io article, 3 primary criteria that each storage vendor (as part of the Anthos Ready Storage initiative) must meet to work with Anthos on-prem:
- Demonstrated core Kubernetes functionality including dynamic provisioning of volumes via open and portable Kubernetes-native storage APIs.
- A proven ability to automatically manage storage across cluster scale-up and scale-down scenarios.
- A simplified deployment experience following Kubernetes practices.
But in the requirements to qualify as an Anthos Ready Storage:
- Ability to deploy the storage CSI driver and its dependencies, using Kubernetes framework
- Core functions that customers require today including, dynamic provisioning of volumes, via the Kubernetes native Storage APIs
- The ability to manage storage for Kubernetes scaleup and scale down scenarios
- Support workload portability with persistence storage for the stateful workloads
There is no mention about distributed persistent storage, and this is something I hope both Anthos and the respective storage vendors will come forth and give us deeper details in the future.
Multi-cloud ambitions and data gravity
Again I conclude that I don’t know things deep enough, and I am ready to be proven wrong. Here is my present closing thoughts for my punditry in this blog.
I wrap up with data gravity. The data that is tied to applications and workload has weights. Data mobility is often an expensive affair that is not necessarily seamless and frictionless between premises and multi-clouds. There will be impact and consequences. Thus, I am in a state of mind that this could stifle the promise of multi-clouds ambitions, be it Anthos or somebody else.
[ 1st Final note: For the record, I have also treaded lightly into understanding cloud native storage platforms for Kubernetes such as Rancher Longhorn, OpenEBS, Portworx Enterprise, StorageOS and Diamanti Spektra. It is still early days. ]
[ 2nd Final note: 2 notable Kubernetes storage vendors show promise about “distributed persistent storage”. See below ]
- Robin.io provides Multi-Cloud portability for Stateful Apps
-
Portworx to add application profiles to persistent container storage