Last week was consumed by many conversations on this topic. I was quite jaded, really. Unfortunately many still take a very simplistic view of all the storage technology, or should I say over-marketing of the storage technology. So much so that the end users make incredible assumptions of the benefits of a storage array or software defined storage platform or even cloud storage. And too often caveats of turning on a feature and tuning a configuration to the max are discarded or neglected. Regards for good storage and data management best practices? What’s that?
I share some of my thoughts handling conversations like these and try to set the right expectations rather than overhype a feature or a function in the data storage services.
Complex data networks and the storage services that serve it
Applications and workloads (A&W) read and write from the data storage services platforms. These could be local DAS (direct access storage), network storage arrays in SAN and NAS, and now objects, or from cloud storage services. Regardless of structured or unstructured data, different A&Ws have different behavioural I/O patterns in accessing data from storage. Therefore storage has to be configured at best to match these patterns, so that it can perform optimally for these A&Ws. Without going into deep details, here are a few to think about:
Random and Sequential patterns
Block sizes of these A&Ws ranging from typically 4K to 1024K.
Causal effects of synchronous and asynchronous I/Os to and from the storage
Of course, I am one of these critics. I don’t deny that I am not. But I read this situation from a multicloud hyperbole of which I am not a fan. Too much multicloud whitewashing by vendors trying to pitch multicloud as a disaster recovery solution without understanding that this is easier said than done.
Data movement is expensive. Not just costs, but also latency and resources as well. Thus there were many narratives to move compute closer to where the data is stored because moving compute is definitely more economical than moving data. I borrowed the analogy of the 2 animals from some old NetApp® slides which depicted storage as the elephant, and compute as birds. It was the perfect analogy, because the storage is heavy and compute is light.
“Close up of a white Great Egret perching on top of an African Elephant aa Amboseli national park, Kenya”
Before the animals representation came about I used to use the term “Data locality, Data Mobility“, because of past work on storage technology in the Oil & Gas subsurface data management pipeline.
Take stock of your data movement
I had recent conversations with an end user who has been paying a lot of dollars keeping their “backup” and “archive” in AWS Glacier. The S3 storage is cheap enough to hold several petabytes of data for years, because the IT folks said that the data in AWS Glacier are for “backup” and “archive”. I put both words in quotes because they were termed as “backup” and “archive” because of their enterprise practice. However, the face of their business is changing. They are in manufacturing, oil and gas downstream, and the definitions of “backup” and “archive” data has changed.
For one, there is a strong demand for reusing the past data for various reasons and these datasets have to be recalled from their cloud storage. Secondly, their data movement activities still mimicked what they did in the past during their enterprise storage days. It was a classic lift-and-shift when they moved to the cloud, and not taking stock of their data movements and the operations they ran on these datasets. Still ongoing, their monthly AWS cost a bomb.
We often hear the word “modernization” thrown around these days. The push is to get the end user to refresh their infrastructure, and the storage infrastructure market is rife with modernization word. Is your storage ripe for “modernization“?
Many possibilities to modernize storage
To modernize, it has to be relative to legacy storage hardware, and the operating environment that came with it. But if the so-called “legacy” still does the job, should you modernize?
Big Data is right
When the word “Big Data” came into prominence a while back, it stirred the IT industry into a frenzy. At one point, Apache Hadoop became the poster elephant (pun intended) for this exciting new segment. So many Vs came out, but I settled with 4 Vs as the framework of my IT conversations. The 4Vs we often hear are:
The world has pretty much settled that hybrid cloud is the way to go for IT infrastructure services today. Straddled between the enterprise data center and the infrastructure-as-a-service in public cloud offerings, hybrid clouds define the storage ecosystems and architecture of choice.
I took a week off blogging last week but the lazy days were inundated by bad news. A few more devastating ransomware attacks. This time, Colonial Pipeline in the US was hacked and its networks were shutdown by ransomware. These ransomware threats are never ending, and they are getting more damaging than ever. It is like trying to plug a leaking boat with your hands, and more leaks appear as you plug them.
More ransomware news hitting healthcare around the world last week:
We are forever chasing for a solution, forever losing because almost all technology defenses to protect the data against ransomware are reactive. Why is ransomware still such a big threat then? Time to rethink file security fundamentals.
We get an avalanche of multicloud selling from storage vendors. We get promises and benefits of multicloud but from whose point of view?
Multicloud is multiple premises
This is an overly simplistic example how I created 3 copies of the same spreadsheet yesterday. I have a quotation on Google Sheets. A fairly complicated one. Someone wanted it in Excel format, but the format and the formulas were all messed up when I tried to download it as XLSX. What I had to do was to download the Google Sheets as ODS (OpenDocument Spreadsheet) format to my laptop, and then upload the LibreOffice file to my OneDrive account, and use Excel Online to open the ODS file and saved as XLSX. In one fell swoop, I have the same spreadsheet in Google Drive, my laptop and OneDrive. 3 copies in 3 different premises.
As we look to the behaviour of data creation and data acquisition, data sharing and data movement, the central repository is the gold image, the most relevant copy of the data. However, for business reasons, data has to be moved to where the applications are. It could be in cloud A or cloud B or cloud C or it could be on-premises. The processed output from cloud A is stored in cloud A, and likewise, cloud B in cloud B and so on.
To get the most significant and relevant copy, data from all premises must be consolidated, thus it has to be moved to a centralized data storage repository. But intercloud data movement is bogged down by egress fees, latency, data migration challenges (like formats and encoding), security, data clearance policies and many other hoops and hurdles.
With all these questions and concerns in mind, the big question mark is “Is multicloud really practical?” From a storage guy like me who loves a great data management story, “It is not. Multicloud creates storage silos“.
How did it become that way? How did AWS Storage became numero uno?
I became interested in the Flywheel concept some years back. It was conceived in Jim Collins’ book, “Good to Great” almost 20 years ago, and since then, Amazon.com has become the real life enactment of the Flywheel concept.
Amazon.com Flywheel – How each turn becomes sturdier, brawnier.
Every turn of the flywheel requires the same amount of effort although in the beginning, the noticeable effect is minuscule. But as every turn gains momentum, the returns of each turn scales greater and greater to the fixed efforts of operating a single turn.
The Enterprise File Sync and Share (EFSS) EasiShare presence is growing rapidly in the region, as enterprises and organizations are quickly redefining the boundaries of the new workspace. Work files and folders are no longer confined to the shared network drives within the local area network. It is going beyond to the “Work from Anywhere” phenomenon that is quickly becoming the way of life. Breaking away from the usual IT security protection creates a new challenge, but EasiShare was conceived with security baked into its DNA. With the recent release, Version 10, file sharing security and resiliency are stronger than ever.
[ Note: I have blogged about EasiShare previously. Check out the 2 links below ]
Public clouds are the obvious choice but for organizations to protect their work files, and keep data secure, services like Dropbox for Business, Microsoft® Office 365 with OneDrive and Google® Workspace are not exactly the kind of file sharing with security as their top priority. A case in point was the 13-hour disruption to Wasabi Cloud last week, where the public cloud storage provider’s domain name, wasabisys.com, was suspended by their domain name registrar because of malware discrepancy at one of its endpoints. There were other high profile cases too.
This is where EasiShare shines, because it is a secure, private EFSS solution for the enterprise and beyond, because business resiliency is in the hands and control of the organization that owns it, not the public cloud service providers.
EasiShare unifies with TrueNAS for secure business resiliency
EasiShare is just one several key business solutions iXsystems™ in Asia Pacific Japan is working closely with, and there is a strong, symbiotic integration with the TrueNAS® platform. Both have strong security features that fortify business resiliency, especially when facing the rampant ransomware scourge.
Value of a Single Unified Data Services Platform
A storage array is not a solution. It is just a box that most vendors push to sell. A storage must be a Data Services Platform. Readers of my blog would know that I have spoken about the Data Services Platform 3 years ago and you can read about it: