It is certainly encouraging to see both NAS protocols, NFS and SMB, featured well in the latest VMware® vSAN 7 Update 1 release. The NFS v3 and v4.1 support was already in vSAN 7.0 when it was earlier announced as part of its Native File Services for vSAN. But some years ago, NFS was not always the primary storage protocol of choice. SAN protocols, Fibre Channel and iSCSI, were almost always designated to serve enterprise applications. At the client side, Windows became prominent, and the SMB/CIFS protocol dominated the landscape of the desktop. This further pushed NFS into the back closet.
NFS or Network File System has its naysayers. The venerable, but often maligned distributed network file protocol is 36 years today. In storage vendors such as NetApp®, VAST Data, Pure Storage FlashBlade, and Dell EMC Isilon, NFS is still positioned as the primary file protocol for manufacturing testers on the shop floor, EDA/eCAD applications, seismic and subsurface applications in Oil & Gas and many more. In another development, just like its presence in the vSAN Native Services,, NFS has also quietly embedded itself into many storage platforms to serve the data platform services within the respective framework itself.
And I have experienced NFS from the client side to the enterprise applications and more, and I take this opportunity to pay tribute.
The multi-cloud for infrastructure-as-a-service (IaaS) era is not here (yet). That is what the technology marketers want you to think. The hype, the vapourware, the frenzy. It is what they do. The same goes to technology analysts where they describe vision and futures, and the high level constructs and strategies to get there. The hype of multi-cloud is often thought of running applications and infrastructure services seamlessly in several public clouds such as Amazon AWS, Microsoft® Azure and Google Cloud Platform, and linking it to on-premises data centers and private clouds. Hybrid is the new black.
Multi-Cloud, on-premises, public and hybrid clouds
And the aspiration of multi-cloud is the right one, when it is truly ready. Gartner® wrote a high level article titled “Why Organizations Choose a Multicloud Strategy“. To take advantage of each individual cloud’s strengths and resiliency in respective geographies make good business sense, but there are many other considerations that cannot be an afterthought. In this blog, we look at a few of them from a data storage perspective.
In the beginning there was …
For this storage dinosaur, data storage and compute have always coupled as one. In the mainframe DASD days. these 2 were together. Even with the rise of networking architectures and protocols, from IBM SNA, DECnet, Ethernet & TCP/IP, and Token Ring FC-SAN (sorry, this is just a joke), the SANs, the filers to the servers were close together, albeit with a network buffered layer.
A decade ago, when the public clouds started appearing, data storage and compute were mostly inseparable. There was demarcation of public clouds and private clouds. The notion of hybrid clouds meant public clouds and private clouds can intermix with on-premise computing and data storage but in almost all cases, this was confined to a single public cloud provider. Until these public cloud providers realized they were not able to entice the larger enterprises to move their IT out of their on-premises data centers to the cloud convincingly. So, these public cloud providers decided to reverse their strategy and peddled their cloud services back to on-prem. Today, Amazon AWS has Outposts; Microsoft® Azure has Arc; and Google Cloud Platform launched Anthos.
Google CEO Sundar Pichai announcing Google Anthos at Next ’19
BigQuery Omni conversation starter
2 weeks ago, whilst the Google Cloud BigQuery Omni announcement was still under wraps, local Malaysian IT portal Enterprise IT News sent me the embargoed article to seek my views and opinions. I have to admit that I was ignorant about the deeper workings of BigQuery, and haven’t fully gone through the works of Google Anthos as well. So I researched them.
Having done some small works on Qubida (defunct) and Talend several years ago, I have grasped useful data analytics and data enablement concepts, and so BigQuery fitted into my understanding of BigQuery Omni quite well. That triggered my interests to write this blog and meshing the persistent storage conundrum (at least for me it is something to be untangled) to Kubernetes, to GKE (Google Kubernetes Engine), and thus Anthos as well.
For discussion sake, here is an overview of BigQuery Omni.
An overview of Google Cloud BigQuery Omni on multiple cloud providers
When the elephant rumbles in the jungle, the whole village takes notice. That was what happened 2 weeks ago when Commvault® announced their multi-year agreement to place Metallic™ into deeper integration with Microsoft® Azure. This strategic partnership will consummate several key areas between the 2 companies, which are Engineering, Go To Market (GTM) and Sales.
The “low hanging fruit” move is of course the tight(er) integration with Microsoft® Azure Blob Storage but the more exciting anticipation is “What else is next“.
An O’Reilly® Media Cloud Adoption survey in January and February 2020 (just before the COVID-19 pandemic) revealed that 25% of the respondents “said that their companies plan to move all of their applications to a cloud context in the next year“. This is no coincidence. It is now Cloud First; Cloud Next; Cloud Big.
Kubernetes is on fire. Last week VMware® released the State of Kubernetes 2020 report which surveyed companies with 1,000 employees and above. Results were not surprising as the adoptions of this nascent technology are booming. But persistent storage remained the nagging concern for the Kubernetes serving the infrastructure resources to applications instances running in the containers of a pod in a cluster.
The standardization of storage resources have settled with CSI (Container Storage Interface). Storage vendors have almost, kind of, sort of agreed that the API objects such as PersistentVolumes, PersistentVolumeClaims, StorageClasses, along with the parameters would be the way to request the storage resources from the Pre-provisioned Volumes via the CSI driver plug-in. There are already more than 50 vendor specific CSI drivers in Github.
Kubernetes and the CSI (Container Storage Interface) logos
The CSI plug-in method is the only way for Kubernetes to scale and keep its dynamic, loadable storage resource integration with external 3rd party vendors, all clamouring to grab a piece of this burgeoning demands both in the cloud and in the enterprise.
[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at the event. The content of this blog is of my own opinions and views ]
A funny photo (below) came up on my Facebook feed a couple of weeks back. In an honest way, it depicted how a developer would think (or the lack of thinking) about the storage infrastructure designs and models for the applications and workloads. This also reminded me of how DBAs used to diss storage engineers. “I don’t care about storage, as long as it is RAID 10“. That was aeons ago 😉
The world of developers and the world of infrastructure people are vastly different. Since cloud computing birthed, both worlds have collided and programmable infrastructure-as-code (IAC) have become part and parcel of cloud native applications. Of course, there is no denying that there is friction.
In the world of software development and delivery, DevOps has taken a liking to containers. Containers make it easier to host and manage life-cycle of web applications inside the portable environment. It packages up application code other dependencies into building blocks to deliver consistency, efficiency, and productivity. To scale to a multi-applications, multi-cloud with th0usands and even tens of thousands of microservices in containers, the Kubernetes factor comes into play. Kubernetes handles tasks like auto-scaling, rolling deployment, computer resource, volume storage and much, much more, and it is designed to run on bare metal, in the data center, public cloud or even a hybrid cloud.
[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at this event. The content of this blog is of my own opinions and views ]
Cloud computing will have challenges processing data at the outer reach of its tentacles. Edge Computing, as it melds with the Internet of Things (IoT), needs a different approach to data processing and data storage. Data generated at source has to be processed at source, to respond to the event or events which have happened. Cloud Computing, even with 5G networks, has latency that is not sufficient to how an autonomous vehicle react to pedestrians on the road at speed or how a sprinkler system is activated in a fire, or even a fraud detection system to signal money laundering activities as they occur.
Furthermore, not all sensors, devices, and IoT end-points are connected to the cloud at all times. To understand this new way of data processing and data storage, have a look at this video by Jay Kreps, CEO of Confluent for Kafka® to view this new perspective.
Data is continuously and infinitely generated at source, and this data has to be compiled, controlled and consolidated with nanosecond precision. At Storage Field Day 19, an interesting open source project, Pravega, was introduced to the delegates by DellEMC. Pravega is an open source storage framework for streaming data and is part of Project Nautilus.
The data generated at source (end-points, sensors, devices) is serialized, timestamped (as event occurs), continuous and infinite. These are the properties of a time series data stream, and to make sense of the streaming data, new data formats such as Avro, Parquet, Orc pepper the landscape along with the more mature JSON and XML, each with its own strengths and weaknesses.
You can learn more about these data formats in the 2 links below:
Many time series projects started as DIY projects in many organizations. And many of them are still DIY projects in production systems as well. They depend on tribal knowledge, and these databases are tied to an unmanaged storage which is not congruent to the properties of streaming data.
At the storage end, the technologies today still rely on the SAN and NAS protocols, and in recent years, S3, with object storage. Block, file and object storage introduce layers of abstraction which may not be a good fit for streaming data.
[Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies to be presented at this event. The content of this blog is of my own opinions and views]
This blog was not intended because it was not in my plans to write it. But a string of events happened in the Storage Field Day 19 week and I have the fodder to share my thoughts. Hadoop is indeed dead.
Warning: There are Lord of the Rings references in this blog. You might want to do some research. 😉
Storage metrics never happened
The fellowship of Arjan Timmerman, Keiran Shelden, Brian Gold (Pure Storage) and myself started at the office of Pure Storage in downtown Mountain View, much like Frodo Baggins, Samwise Gamgee, Peregrine Took and Meriadoc Brandybuck forging their journey vows at Rivendell. The podcast was supposed to be on the topic of storage metrics but was unanimously swung to talk about Hadoop under the stewardship of Mr. Stephen Foskett, our host of Tech Field Day. I saw Stephen as Elrond Half-elven, the Lord of Rivendell, moderating the podcast as he would have in the plans of decimating the One Ring in Mount Doom.
So there we were talking about Hadoop, or maybe Sauron, or both.
The photo of the Oliphaunt below seemed apt to describe the industry attacks on Hadoop.
Cloud Data Management is a tricky word. Often vague, ambigious, how exactly would you define “Cloud Data Management“?
Fresh off the boat from Commvault GO 2019 in Denver, Colorado last week, I was invited to sample Veeam a few days ago at their Solution Day and soak into their rocketing sales in Asia Pacific, and strong market growth too. They reported their Q3 numbers this week, impressing many including yours truly.
I went to the seminar early in the morning, quite in awe of their vibrant partners and resellers activities and ecosystem compared to the tepid Commvault efforts in Malaysia over the past decade. Veeam’s presence in Malaysia is shorter than Commvault’s but they are able to garner a stronger following with partners and customers alike.
[Disclosure: I was invited by Commvault as a Media person and Social Ambassador to their Commvault GO 2019 Conference and also a Tech Field Day eXtra delegate from Oct 13-17, 2019 in the Denver CO, USA. My expenses, travel, accommodation and conference fees were covered by Commvault, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]
The waltz across the Commvault-Hedvig mine field will not be easy. Commvault will have a lot of open discussions about their acquisition of Hedvig and how Hedvig “primary storage platform” will fit into a “secondary storage framework” of Commvault. The outcome of this consummation is yet to appear as a structured form. The storyline will eventually form as Commvault’s diligence to define their strategy moving forward.
A lot of excitement and buzz were generated around the metallic, the Commvault venture into Software-as-a-Service (SaaS). The SaaS solution is targeted at the mid-market for organizations with 500-2500 staff count. Its simplicity and pricing were the 2 things which gave me a good feeling all over. There is even a 45-day trial for metallic.
My Day 2 itinerary was more specific because my agenda for this trip was to seek answers to the realization of Commvault-Hedvig.
Commvault took the distinction of using the vision of a DataBrain (#databrain) to define their strategy. From the picture below, the left and right hemisphere of the DataBrain forms the Storage Management piece on the left and Data Management on the right.