Around the year 2016, I started to put together a better structure to explain storage infrastructure. I started using the word Data Services Platform before what it is today. And I formed a pictorial scaffold to depict what I wanted to share. This was what I made at that time.
Data Services Platform (circa 2016)- Copyright Heoh Chin Fah
One of the reasons I am bringing this up again is many of the end users and resellers still look at storage from the perspective of capacity, performance and price. And as if two plus two equals five, many storage pre-sales and architects reciprocate with the same type of responses that led to the deteriorated views of the storage technology infrastructure industry as a whole. This situation irks me. A lot.
Data movement is expensive. Not just costs, but also latency and resources as well. Thus there were many narratives to move compute closer to where the data is stored because moving compute is definitely more economical than moving data. I borrowed the analogy of the 2 animals from some old NetApp® slides which depicted storage as the elephant, and compute as birds. It was the perfect analogy, because the storage is heavy and compute is light.
“Close up of a white Great Egret perching on top of an African Elephant aa Amboseli national park, Kenya”
Before the animals representation came about I used to use the term “Data locality, Data Mobility“, because of past work on storage technology in the Oil & Gas subsurface data management pipeline.
Take stock of your data movement
I had recent conversations with an end user who has been paying a lot of dollars keeping their “backup” and “archive” in AWS Glacier. The S3 storage is cheap enough to hold several petabytes of data for years, because the IT folks said that the data in AWS Glacier are for “backup” and “archive”. I put both words in quotes because they were termed as “backup” and “archive” because of their enterprise practice. However, the face of their business is changing. They are in manufacturing, oil and gas downstream, and the definitions of “backup” and “archive” data has changed.
For one, there is a strong demand for reusing the past data for various reasons and these datasets have to be recalled from their cloud storage. Secondly, their data movement activities still mimicked what they did in the past during their enterprise storage days. It was a classic lift-and-shift when they moved to the cloud, and not taking stock of their data movements and the operations they ran on these datasets. Still ongoing, their monthly AWS cost a bomb.
We get an avalanche of multicloud selling from storage vendors. We get promises and benefits of multicloud but from whose point of view?
Multicloud is multiple premises
This is an overly simplistic example how I created 3 copies of the same spreadsheet yesterday. I have a quotation on Google Sheets. A fairly complicated one. Someone wanted it in Excel format, but the format and the formulas were all messed up when I tried to download it as XLSX. What I had to do was to download the Google Sheets as ODS (OpenDocument Spreadsheet) format to my laptop, and then upload the LibreOffice file to my OneDrive account, and use Excel Online to open the ODS file and saved as XLSX. In one fell swoop, I have the same spreadsheet in Google Drive, my laptop and OneDrive. 3 copies in 3 different premises.
As we look to the behaviour of data creation and data acquisition, data sharing and data movement, the central repository is the gold image, the most relevant copy of the data. However, for business reasons, data has to be moved to where the applications are. It could be in cloud A or cloud B or cloud C or it could be on-premises. The processed output from cloud A is stored in cloud A, and likewise, cloud B in cloud B and so on.
To get the most significant and relevant copy, data from all premises must be consolidated, thus it has to be moved to a centralized data storage repository. But intercloud data movement is bogged down by egress fees, latency, data migration challenges (like formats and encoding), security, data clearance policies and many other hoops and hurdles.
With all these questions and concerns in mind, the big question mark is “Is multicloud really practical?” From a storage guy like me who loves a great data management story, “It is not. Multicloud creates storage silos“.
I mentioned that long term digital data preservation is a segment within the data lifecycle which has merits and prominence. SNIA® has proved that this is a strong growing market segment through its 2007 and 2017 “100 Year Archive” surveys, respectively. 3 critical challenges of this long, long-term digital data preservation is to keep the archives
For the longest time, tape technology has been the king of the hill for digital data preservation. The technology is cheap, mature, and many enterprises has built their long term strategy around it. And the pulse in the tape technology market is still very healthy.
The challenges of tape remain. Every 5 years or so, companies have to consider moving the data on the existing tape technology to the next generation. It is widely known that LTO can read tapes of the previous 2 generations, and write to it a generation before. The tape transcription process of migrating digital data for the sake of data preservation is bad because it affects the structural integrity and quality of the content of the data.
In my times covering the Oil & Gas subsurface data management, I have seen NOCs (national oil companies) with 500,000 tapes of all generations, from 1/2″ to DDS, DAT to SDLT, 3590 to LTO 1-7. And millions are spent to transcribe these tapes every few years and we have folks like Katalyst DM, Troika and more hovering this landscape for their fill.
One of the historical feats which had me mesmerized for a long time was the 14-year journey China’s imperial treasures took to escape the Japanese invasion in the early 1930s, sandwiched between rebellions and civil wars in China. More than 20,000 pieces of the imperial treasures took a perilous journey to the west and back again. Divided into 3 routes over a decade and four years, not a single piece of treasure was broken or lost. All in the name of preservation.
Digital data preservation is on another end of the data lifecycle spectrum. More often than not, it is not the part that many pay attention to. In the past 2 decades, digital data has grown so much that it is now paramount to keep the data forever. Mind you, this is not the data hoarding kind but to preserve the knowledge and wisdom which is in the digital content of the data.
SNIA (Storage Networking Industry Association) conducted 2 surveys – one in 2007 and another in 2017 – called the 100 Year Archive, and found that the requirement for preserving digital data has grown multiple folds over the 10 years. In the end, the final goal is to ensure that the perpetual digital contents are
All at an affordable cost. Therefore, SNIA has the vision that the digital content must transcend beyond the storage medium, the storage system and the technology that holds it.
I picked up the latest development news about Quantum Corporation. Last month, in December 2018, they secured a USD210 million financial lifeline to support their deflating business and their debts. And if you follow their development, they are with their 3rd CEO in the past 12 months, which is quite extraordinary. What is happening at Quantum Corp?
Quantum Logo (PRNewsFoto/Quantum Corp.)
Stornext – The Swiss Army knife of Data Management
I have known Quantum since 2000, very focused on the DLT tape library business. At that time, prior to the coming of LTO, DLT and its successor, SuperDLT dominated the tape market together with IBM. In 2006, they acquired ADIC, another tape vendor and became one of the largest tape library vendors in the world. From the ADIC acquisition, Quantum also got their rights on Stornext, a high performance scale out file system. I was deeply impressed with Stornext, and I once called it the Swiss Army knife of Data Management. The versatility of Stornext addressed many of the required functions within the data management lifecycle and workflows, and thus it has made its name in the Media and Entertainment space.
Jack of all trades, master of none
However, Quantum has never reached great heights in my opinion. They are everything to everybody, like a Jack of all trades, master of none. They are backup with their tape libraries and DXi series, archive and tiering with the Lattus, hybrid storage with QXS, and file system and scale-out with Stornext. If they have good business run rates and a healthy pipeline, having a broad product line is fine and dandy. But Quantum has been having CEO changes like turning a turnstile, and amid “a few” accounting missteps and a 2018 CEO who only lasted 5 months, they better steady their rocking boat quickly. Continue reading →
The Register wrote a damning piece about NetApp a few days ago. I felt it was irresponsible because this is akin to kicking a man when he’s down. It is easy to do that. The writer is clearly missing the forest for the trees. He was targeting NetApp’s Clustered Data ONTAP (cDOT) and missing the entire philosophy of NetApp’s mission and vision in Data Fabric.
I have always been a strong believer that you must treat Data like water. Just like what Jeff Goldblum famously quoted in Jurassic Park, “Life finds a way“, data as it moves through its lifecycle, will find its way into the cloud and back.
And every storage vendor today has a cloud story to tell. It is exciting to listen to everyone sharing their cloud story. Cloud makes sense when it addresses different workloads such as the sharing of folders across multiple devices, backup and archiving data to the cloud, tiering to the cloud, and the different cloud service models of IaaS, PaaS, SaaS and XaaS.