Storage in a shiny multi-cloud space

The multi-cloud for infrastructure-as-a-service (IaaS) era is not here (yet). That is what the technology marketers want you to think. The hype, the vapourware, the frenzy. It is what they do. The same goes to technology analysts where they describe vision and futures, and the high level constructs and strategies to get there. The hype of multi-cloud is often thought of running applications and infrastructure services seamlessly in several public clouds such as Amazon AWS, Microsoft® Azure and Google Cloud Platform, and linking it to on-premises data centers and private clouds. Hybrid is the new black.

Multicloud connectivity to public cloud providers and on-premises private cloud

Multi-Cloud, on-premises, public and hybrid clouds

And the aspiration of multi-cloud is the right one, when it is truly ready. Gartner® wrote a high level article titled “Why Organizations Choose a Multicloud Strategy“. To take advantage of each individual cloud’s strengths and resiliency in respective geographies make good business sense, but there are many other considerations that cannot be an afterthought. In this blog, we look at a few of them from a data storage perspective.

In the beginning there was … 

For this storage dinosaur, data storage and compute have always coupled as one. In the mainframe DASD days. these 2 were together. Even with the rise of networking architectures and protocols, from IBM SNA, DECnet, Ethernet & TCP/IP, and Token Ring FC-SAN (sorry, this is just a joke), the SANs, the filers to the servers were close together, albeit with a network buffered layer.

A decade ago, when the public clouds started appearing, data storage and compute were mostly inseparable. There was demarcation of public clouds and private clouds. The notion of hybrid clouds meant public clouds and private clouds can intermix with on-premise computing and data storage but in almost all cases, this was confined to a single public cloud provider. Until these public cloud providers realized they were not able to entice the larger enterprises to move their IT out of their on-premises data centers to the cloud convincingly. So, these public cloud providers decided to reverse their strategy and peddled their cloud services back to on-prem. Today, Amazon AWS has Outposts; Microsoft® Azure has Arc; and Google Cloud Platform launched Anthos.

Continue reading

Give back or no give

[ Disclosure: I work for iXsystems™ Inc. Views and opinions are my own. ]

If my memory served me right, I recalled the illustrious leader of the Illumos project, Garrett D’Amore ranting about companies, big and small, taking OpenZFS open source codes and projects to incorporate into their own technology but hardly ever giving back to the open source community. That was almost 6 years ago.

My thoughts immediately go back to the days when open source was starting to take off back in the early 2000s. Oracle 9i database had just embraced Linux in a big way, and the book by Eric S. Raymond, “The Cathedral and The Bazaar” was a big hit.

The Cathedral & The Bazaar by Eric S. Raymond

Since then, the blooming days of proprietary software world began to wilt, and over the next twenty plus year, open source software has pretty much taken over the world. Even Microsoft®, the ruthless ruler of the Evil Empire caved in to some of the open source calls. The Microsoft® “I Love Linux” embrace definitely gave the victory feeling of the Rebellion win over the Empire. Open Source won.

Open Source bag of worms

Even with the concerted efforts of the open source communities and projects, there were many situations which have caused frictions and inadvertently, major issues as well. There are several open source projects licenses, and they are not always compatible when different open source projects mesh together for the greater good.

On the storage side of things, 2 “incidents” caught the attention of the masses. For instance, Linus Torvalds, Linux BDFL (Benevolent Dictator for Life) and emperor supremo said “Don’t use ZFS” partly due to the ignorance and incompatibility of Linux GPL (General Public License) and ZFS CDDL (Common Development and Distribution License). That ruffled some feathers amongst the OpenZFS community that Matt Ahrens, the co-creator of the ZFS file system and OpenZFS community leader had to defend OpenZFS from Linus’ comments.

Continue reading

The prudence needed for storage technology companies

Blitzscaling has been on my mind a lot. Ever since I discovered that word a while back, it has returned time and time again to fill my thoughts. In the wake of COVID-19, and in the mire of this devastating pandemic, is blitzscaling still the right strategy for this generation of storage technology, hyperconverged, data management and cloud storage startups?

What the heck is Blitzscaling? 

For the uninformed, here’s a video of Reid Hoffman, co-founder of Linked and a member of the Paypal mafia, explaining Blitzscaling.

Blitzscaling is about hyper growing, scaling ultra fast and rocketing to escape velocity, at the expense of things like management efficiency, financial prudence, profits and others. While this blog focuses on storage companies, blitzscaling is probably most recognizable in the massive expansion of Uber (and contraction) a few years ago. In the US, the ride hailing war is between Uber and Lyft, but over here in South East Asia, just a few years back, it was between Uber and Grab. In China it was Uber and Didi.

From the storage angle, 2 segments exemplified the blitzscaling culture between 2015 and 2020.

  • All Flash Startups
  • Hyper Converged Infrastructure Startups

Continue reading

Persistent Storage could stifle Google Anthos multi-cloud ambitions

To win in the multi-cloud game, you have to be in your competitors’ cloud. Google Cloud has been doing that since they announced Google Anthos just over a year ago. They have been crafting their “assault”, starting with on-premises, and Anthos on AWS. Anthos on Microsoft® Azure is coming, currently in preview mode.

Google CEO Sundar Pichai announcing Google Anthos at Next ’19

BigQuery Omni conversation starter

2 weeks ago, whilst the Google Cloud BigQuery Omni announcement was still under wraps, local Malaysian IT portal Enterprise IT News sent me the embargoed article to seek my views and opinions. I have to admit that I was ignorant about the deeper workings of BigQuery, and haven’t fully gone through the works of Google Anthos as well. So I researched them.

Having done some small works on Qubida (defunct) and Talend several years ago, I have grasped useful data analytics and data enablement concepts, and so BigQuery fitted into my understanding of BigQuery Omni quite well. That triggered my interests to write this blog and meshing the persistent storage conundrum (at least for me it is something to be untangled) to Kubernetes, to GKE (Google Kubernetes Engine), and thus Anthos as well.

For discussion sake, here is an overview of BigQuery Omni.

An overview of Google Cloud BigQuery Omni on multiple cloud providers

My comments and views are in this EITN article “Google Cloud’s BigQuery Omni for Multi-cloud Analytics”.

Continue reading

Down the rabbit hole with Kubernetes Storage

Kubernetes is on fire. Last week VMware® released the State of Kubernetes 2020 report which surveyed companies with 1,000 employees and above. Results were not surprising as the adoptions of this nascent technology are booming. But persistent storage remained the nagging concern for the Kubernetes serving the infrastructure resources to applications instances running in the containers of a pod in a cluster.

The standardization of storage resources have settled with CSI (Container Storage Interface). Storage vendors have almost, kind of, sort of agreed that the API objects such as PersistentVolumes, PersistentVolumeClaims, StorageClasses, along with the parameters would be the way to request the storage resources from the Pre-provisioned Volumes via the CSI driver plug-in. There are already more than 50 vendor specific CSI drivers in Github.

Kubernetes and CSI initiative

Kubernetes and the CSI (Container Storage Interface) logos

The CSI plug-in method is the only way for Kubernetes to scale and keep its dynamic, loadable storage resource integration with external 3rd party vendors, all clamouring to grab a piece of this burgeoning demands both in the cloud and in the enterprise.

Continue reading

A Dialogue between 2 Drives

I was talking to an end user who was slowly getting exposed to the cloud amid this Covid-19 pandemic. The whole work from home thingy was not new to him, but the scale of the practice suddenly escalated when more than 80 of his staff have to work from wherever they were stuck at during the past 6 weeks. Initially all of his staff had to alternate their folders and files access because their Sonicwall® Global Client license and SSL VPN Clients were inadequate. Even after their upgrade of the licenses, the performance of getting the folders and files through the Z: drive was poor and the network was chocked up. I told them that regardless, the SMB protocol of the NAS shared folders was chatty and generated a lot of network traffic on the VPN, along with the inadequacies of running this over the wide area Internet network. Staff productivity obviously nosedived.

We are now exploring putting their work in the cloud but maintaining a consistent synchronized set of folders and files at all times. Wasabi® Cloud has emerged the most attractive price/GB/month and no egress or API requests fees.

Combining 2 shared drives into one

NAS Drive talking to Cloud Drive like 2 buddies

Now here is a story of 2 Drives

The end user is not an IT savvy user. They were unfamiliar with Cloud Storage other than the free personal ones like Google Drive, or Dropbox. They have more than 200TB and I have introduced to them Wasabi® Cloud. They were very familiar with their Z:, their NAS Drive. I introduced to them the Cloud Drive.

NAS: Hey, how’s it going?

Cloud: Not bad. My boss and your boss are talking about bringing me and Wasabi® Cloud to join your gang. Hope you are OK with that.

Continue reading

Falconstor Software Defined Data Preservation for the Next Generation

Falconstor® Software is gaining momentum. Given its arduous climb back to the fore, it is beginning to soar again.

Tape technology and Digital Data Preservation

I mentioned that long term digital data preservation is a segment within the data lifecycle which has merits and prominence. SNIA® has proved that this is a strong growing market segment through its 2007 and 2017 “100 Year Archive” surveys, respectively. 3 critical challenges of this long, long-term digital data preservation is to keep the archives

  • Accessible
  • Undamaged
  • Usable

For the longest time, tape technology has been the king of the hill for digital data preservation. The technology is cheap, mature, and many enterprises has built their long term strategy around it. And the pulse in the tape technology market is still very healthy.

The challenges of tape remain. Every 5 years or so, companies have to consider moving the data on the existing tape technology to the next generation. It is widely known that LTO can read tapes of the previous 2 generations, and write to it a generation before. The tape transcription process of migrating digital data for the sake of data preservation is bad because it affects the structural integrity and quality of the content of the data.

In my times covering the Oil & Gas subsurface data management, I have seen NOCs (national oil companies) with 500,000 tapes of all generations, from 1/2″ to DDS, DAT to SDLT, 3590 to LTO 1-7. And millions are spent to transcribe these tapes every few years and we have folks like Katalyst DM, Troika and more hovering this landscape for their fill.

Continue reading

The Falcon to soar again

One of the historical feats which had me mesmerized for a long time was the 14-year journey China’s imperial treasures took to escape the Japanese invasion in the early 1930s, sandwiched between rebellions and civil wars in China. More than 20,000 pieces of the imperial treasures took a perilous journey to the west and back again. Divided into 3 routes over a decade and four years, not a single piece of treasure was broken or lost. All in the name of preservation.

Today, that 20,000 over pieces live in perpetuity in 2 palaces – Beijing Palace Museum in China and National Palace Museum Taipei in Taiwan

Digital data preservation

Digital data preservation is on another end of the data lifecycle spectrum. More often than not, it is not the part that many pay attention to. In the past 2 decades, digital data has grown so much that it is now paramount to keep the data forever. Mind you, this is not the data hoarding kind but to preserve the knowledge and wisdom which is in the digital content of the data.

[ Note: If you are interested to know more about Data -> Information -> Knowledge -> Wisdom, check out my 2015 article on LinkedIn ]

SNIA (Storage Networking Industry Association) conducted 2 surveys – one in 2007 and another in 2017 – called the 100 Year Archive, and found that the requirement for preserving digital data has grown multiple folds over the 10 years. In the end, the final goal is to ensure that the perpetual digital contents are

  • Accessible
  • Undamaged
  • Usable

All at an affordable cost. Therefore, SNIA has the vision that the digital content must transcend beyond the storage medium, the storage system and the technology that holds it.

The Falcon reemerges

A few weeks ago, I had the privilege to speak with Falconstor® Software‘s David Morris (VP of Global Product Strategy & Marketing) and Mark Delsman (CTO). It was my first engagement with Falconstor® in almost 9 years! I wrote a piece of Falconstor® in my blog in 2011.

Continue reading

Iconik Content Management Solutions with FreeNAS – Part 2

[ Note: This is still experimental and should not be taken as production materials. I took a couple days over the weekend to “muck” around the new Iconik plug-in in FreeNAS™ to prepare for as a possible future solution. ]

This part is the continuation of Part 1 posted earlier.

iconik has partnered with iXsystems™ almost a year ago. iconik is a cloud-based media content management platform. Its storage repository has many integration with public cloud storage such as Google Cloud, Wasabi® Cloud and more. The on-premises storage integration is made through iconik storage gateway, and it presents itself to FreeNAS™ and TrueNAS® via plugins.

For a limited, you get free access to iconik via this link.

iconik  – The Application setup

[ Note: A lot of the implementation details come from this iXsystems™ documentation by Joe Dutka. This is an updated version for the latest 11.3 U1 release ]

iconik is feature rich and navigating it to setup the storage gateway can be daunting. Fortunately the iXsystems™ documentation was extremely helpful. It is also helpful to consider this as a 2-step approach so that you won’t get overwhelmed of what is happening.

  • Set up the Application section
    • Get Application ID
    • Get Authorization Token
  • Set up the Storage section
    • Get Storage ID

The 3 credentials (Application ID, Authorization Token, Storage ID) are required to set up the iconik Storage Gateway at the FreeNAS™ iconik plug-in setup.

Continue reading

Iconik Content Management Solutions with FreeNAS – Part 1

[ Note: This is still experimental and should not be taken as production materials. I took a couple days over the weekend to “muck” around the new Iconik plug-in in FreeNAS™ to prepare for as a possible future solution. ]

The COVID-19 situation goes on unabated. A couple of my customers asked about working from home and accessing their content files and coincidentally both are animation studios. Meanwhile, there was another opportunity asking about a content management solution that would work with the FreeNAS™ storage system we were proposing. Over the weekend, I searched for a solution that would combine both content management and cloud access that worked with both FreeNAS™ and TrueNAS®, and I was glad to find the iconik and TrueNAS® partnership.

In this blog (and part 2 later), I document the key steps to setup the iconik plug-in with FreeNAS™. I am using FreeNAS™ 11.3U1.

Dataset 777

A ZFS dataset assigned to be the storage repository for the “Storage Target” in iconik. Since iconik has a different IAM (identity access management) than the user/group permissions in FreeNAS, we have make the ZFS dataset to have Read/Write access to all. That is the 777 permission in Unix speak. Note that there is a new ACL manager in version 11.3, and the permissions/access rights screenshot is shown here.

Take note that this part is important. We have to assign @everyone to have Full Control because the credentials at iconik is tied to the permissions we set for @everyone. Missing this part will deny the iconik storage gateway scanner to peruse this folder, and the status will remain “Inactive”.  We will discuss this part more in Part 2.

Continue reading