How well do you know your data and the storage platform that processes the data

Last week was consumed by many conversations on this topic. I was quite jaded, really. Unfortunately many still take a very simplistic view of all the storage technology, or should I say over-marketing of the storage technology. So much so that the end users make incredible assumptions of the benefits of a storage array or software defined storage platform or even cloud storage. And too often caveats of turning on a feature and tuning a configuration to the max are discarded or neglected. Regards for good storage and data management best practices? What’s that?

I share some of my thoughts handling conversations like these and try to set the right expectations rather than overhype a feature or a function in the data storage services.

Complex data networks and the storage services that serve it

I/O Characteristics

Applications and workloads (A&W) read and write from the data storage services platforms. These could be local DAS (direct access storage), network storage arrays in SAN and NAS, and now objects, or from cloud storage services. Regardless of structured or unstructured data, different A&Ws have different behavioural I/O patterns in accessing data from storage. Therefore storage has to be configured at best to match these patterns, so that it can perform optimally for these A&Ws. Without going into deep details, here are a few to think about:

  • Random and Sequential patterns
  • Block sizes of these A&Ws ranging from typically 4K to 1024K.
  • Causal effects of synchronous and asynchronous I/Os to and from the storage

Continue reading

Storage Elephant Compute Birds

Data movement is expensive. Not just costs, but also latency and resources as well. Thus there were many narratives to move compute closer to where the data is stored because moving compute is definitely more economical than moving data. I borrowed the analogy of the 2 animals from some old NetApp® slides which depicted storage as the elephant, and compute as birds. It was the perfect analogy, because the storage is heavy and compute is light.

“Close up of a white Great Egret perching on top of an African Elephant aa Amboseli national park, Kenya”

Before the animals representation came about I used to use the term “Data locality, Data Mobility“, because of past work on storage technology in the Oil & Gas subsurface data management pipeline.

Take stock of your data movement

I had recent conversations with an end user who has been paying a lot of dollars keeping their “backup” and “archive” in AWS Glacier. The S3 storage is cheap enough to hold several petabytes of data for years, because the IT folks said that the data in AWS Glacier are for “backup” and “archive”. I put both words in quotes because they were termed as “backup” and “archive” because of their enterprise practice. However, the face of their business is changing. They are in manufacturing, oil and gas downstream, and the definitions of “backup” and “archive” data has changed.

For one, there is a strong demand for reusing the past data for various reasons and these datasets have to be recalled from their cloud storage. Secondly, their data movement activities still mimicked what they did in the past during their enterprise storage days. It was a classic lift-and-shift when they moved to the cloud, and not taking stock of  their data movements and the operations they ran on these datasets. Still ongoing, their monthly AWS cost a bomb.

Continue reading

The Starbucks model for Storage-as-a-Service

Starbucks™ is not a coffee shop. It purveys beyond coffee and tea, and food and puts together the yuppie beverages experience. The intention is to get the customers to stay as long as they can, and keep purchasing the Starbucks’ smorgasbord of high margin provisions in volume. Wifi, ambience, status, coffee or tea with your name on it (plenty of jokes and meme there), energetic baristas and servers, fancy coffee roasts and beans et. al. All part of the Starbucks™-as-a-Service pleasurable affair that intends to lock the customer in and have them keep coming back.

The Starbucks experience

Data is heavy and they know it

Unlike compute and network infrastructures, storage infrastructures holds data persistently and permanently. Data has to land on a piece of storage medium. Coupled that with the fact that data is heavy, forever growing and data has gravity, you have a perfect recipe for lock-in. All storage purveyors, whether they are on-premises data center enterprise storage or public cloud storage, and in between, there are many, many methods to keep the data chained to a storage technology or a storage service for a long time. The storage-as-a-service is like tying the cow to the stake and keeps on milking it. This business model is very sticky. This stickiness is also a lock-in mechanism.

Continue reading

What the heck is Storage Modernization?

We often hear the word “modernization” thrown around these days. The push is to get the end user to refresh their infrastructure, and the storage infrastructure market is rife with modernization word. Is your storage ripe for “modernization“?

Many possibilities to modernize storage

To modernize, it has to be relative to legacy storage hardware, and the operating environment that came with it. But if the so-called “legacy” still does the job, should you modernize?

Big Data is right

When the word “Big Data” came into prominence a while back, it stirred the IT industry into a frenzy. At one point, Apache Hadoop became the poster elephant (pun intended) for this exciting new segment. So many Vs came out, but I settled with 4 Vs as the framework of my IT conversations. The 4Vs we often hear are:

  • Volume
  • Velocity
  • Variety
  • Veracity

Continue reading

What If – The other side of Storage FUDs

Streaming on Disney+ now is Marvel Studios’ What If…? animated TV series. In the first episode, Peggy Carter, instead of Steve Rogers, took the super soldier serum and became the first Avenger. The TV series explores alternatives and possibilities of what we may have considered as precept and the order of things.

As storage practitioners, we are often faced with certain “dogmatic” arguments which were often a mix of measured actuality and marketing magic – aka FUD (fear, uncertainty, doubt). Time and again, we are thrown a curve ball, like “Oh, your competitor can do this. Can you?” Suddenly you are feeling pinned to a corner, and the pressure to defend your turf rises. You fumbled; You have no answer; Game over!

I experienced these hearty objections many times over. The best experience was one particular meeting I had during my early days with NetApp® in 2000. I was only 1-2 months with the company, still wet between the ears with the technology. I was pitching the SnapMirror® to Ericsson Malaysia when the Scandinavian manager said, “I think you are lying!“. I was lost without a response. I fumbled spectacularly although I couldn’t remember if we won or lost that opportunity.

Here are a few I often encountered. Let’s play the game of What If …?

What If …?

Continue reading

SSOT of Files

[ This is part two of “Where are your files living now?”. You can read Part One here ]

Data locality, Data mobility“. It was a term I like to use a lot when describing about data consolidation, leading to my mention about files and folders, and where they live in my previous blog. The thinking of where the files and folders are now as in everywhere as they can be in a plethora of premises stretches the premise of SSOT (Single Source of Truth). And this expatriation of files with minimal checks and balances disturbs me.

A year ago, just before I joined iXsystems, I was given Google® embargoed news, probably a week before they announced BigQuery Omni. Then I was interviewed by Enterprise IT News, a local Malaysian technology news portal to provide an opinion quote. This was what I quoted:

“’The data warehouse in the cloud’ managed services of Big Query is underpinned by Google® Anthos, its hybrid cloud infra and service management platform based on GKE (Google® Kubernetes Engine). The containerised applications, both on-prem and in the multi-clouds, would allow Anthos to secure and orchestrate infra, services and policy management under one roof.”

I further quoted ” The data repositories remain in each cloud is good to address data sovereignty, data security concerns but it did not mention how it addresses “single source of truth” across multi-clouds.

Single Source of Truth – regardless of repositories

Continue reading

Enterprise Storage is not just a Label

I have many anecdotes around the topic of Enterprise Storage, but the conversations in the past 2 weeks made it important for me to share this.

Enterprise Storage is …

Amusing, painful, angry

I get riled up whenever people do not want to be educated about Enterprise Storage. Here are a few that happened in the last 2 weeks.

[ Story #1 ]

A guy was building his own storage for cryptocurrency. He was informed by his supplier that the RAID card was enterprise, and he could get the best performance using “Enterprise” RAID-0.

  • Well, “Enterprise” RAID-0 volume crashed, and he lost all data. Painfully, he said he lost a hefty sum financially

[ Story #2 ]

A media company complained about the reliability of previous storage vendor. The GM was shopping around and was told that there are “Enterprise” SATA drives and the reliability is as good, if not better than SAS drives.

  • The company wanted a fully reliable Enterprise Storage system with 99.999% availability, and yet the SATA interface was not meant to build a more highly reliable enterprise storage. The GM insisted to use “Enterprise” SATA drives for his “enterprise” storage system instead of SAS.  

[ Story #3 ]

An IT admin of a manufacturing company claimed that they had an “Enterprise Storage” system for a few years, and could not figure out why his hard disk drives would die every 12-15 months.

  • He figured out that the drives supplied by his vendor were consumer SATA drives, even though he was told it was an “Enterprise Storage” system when he bought the system.

Continue reading

Multicloud is sprouting Storage Silos

Grain Silos

We get an avalanche of multicloud selling from storage vendors. We get promises and benefits of multicloud but from whose point of view?

Multicloud is multiple premises

This is an overly simplistic example how I created 3 copies of the same spreadsheet yesterday. I have a quotation on Google Sheets. A fairly complicated one. Someone wanted it in Excel format, but the format and the formulas were all messed up when I tried to download it as XLSX. What I had to do was to download the Google Sheets as ODS (OpenDocument Spreadsheet) format to my laptop, and then upload the LibreOffice file to my OneDrive account, and use Excel Online to open the ODS file and saved as XLSX. In one fell swoop, I have the same spreadsheet in Google Drive, my laptop and OneDrive. 3 copies in 3 different premises. 

As we look to the behaviour of data creation and data acquisition, data sharing and data movement, the central repository is the gold image, the most relevant copy of the data. However, for business reasons, data has to be moved to where the applications are. It could be in cloud A or cloud B or cloud C or it could be on-premises. The processed output from cloud A is stored in cloud A, and likewise, cloud B in cloud B and so on.

To get the most significant and relevant copy, data from all premises must be consolidated, thus it has to be moved to a centralized data storage repository. But intercloud data movement is bogged down by egress fees, latency, data migration challenges (like formats and encoding), security, data clearance policies and many other hoops and hurdles.

With all these questions and concerns in mind, the big question mark is “Is multicloud really practical?” From a storage guy like me who loves a great data management story, “It is not. Multicloud creates storage silos“.

Continue reading

Storage in a shiny multi-cloud space

The multi-cloud for infrastructure-as-a-service (IaaS) era is not here (yet). That is what the technology marketers want you to think. The hype, the vapourware, the frenzy. It is what they do. The same goes to technology analysts where they describe vision and futures, and the high level constructs and strategies to get there. The hype of multi-cloud is often thought of running applications and infrastructure services seamlessly in several public clouds such as Amazon AWS, Microsoft® Azure and Google Cloud Platform, and linking it to on-premises data centers and private clouds. Hybrid is the new black.

Multicloud connectivity to public cloud providers and on-premises private cloud

Multi-Cloud, on-premises, public and hybrid clouds

And the aspiration of multi-cloud is the right one, when it is truly ready. Gartner® wrote a high level article titled “Why Organizations Choose a Multicloud Strategy“. To take advantage of each individual cloud’s strengths and resiliency in respective geographies make good business sense, but there are many other considerations that cannot be an afterthought. In this blog, we look at a few of them from a data storage perspective.

In the beginning there was … 

For this storage dinosaur, data storage and compute have always coupled as one. In the mainframe DASD days. these 2 were together. Even with the rise of networking architectures and protocols, from IBM SNA, DECnet, Ethernet & TCP/IP, and Token Ring FC-SAN (sorry, this is just a joke), the SANs, the filers to the servers were close together, albeit with a network buffered layer.

A decade ago, when the public clouds started appearing, data storage and compute were mostly inseparable. There was demarcation of public clouds and private clouds. The notion of hybrid clouds meant public clouds and private clouds can intermix with on-premise computing and data storage but in almost all cases, this was confined to a single public cloud provider. Until these public cloud providers realized they were not able to entice the larger enterprises to move their IT out of their on-premises data centers to the cloud convincingly. So, these public cloud providers decided to reverse their strategy and peddled their cloud services back to on-prem. Today, Amazon AWS has Outposts; Microsoft® Azure has Arc; and Google Cloud Platform launched Anthos.

Continue reading