Data Management – Page 6

What If – The other side of Storage FUDs

By cfheoh | August 23, 2021 - 8:00 am |August 21, 2021 Backup, Big Data, Business Continuity, Cloud, Data, Data Archiving, Data Availability, Data Corruption, Data Management, Data Privacy, Data Protection, Data Security, Green Computing, High Performance Computing, Hyperconvergence, Performance Benchmark, Performance Caching, Reliability, Storage Tiering, Virtualization

1 Comment

Streaming on Disney+ now is Marvel Studios’ What If…? animated TV series. In the first episode, Peggy Carter, instead of Steve Rogers, took the super soldier serum and became the first Avenger. The TV series explores alternatives and possibilities of what we may have considered as precept and the order of things.

As storage practitioners, we are often faced with certain “dogmatic” arguments which were often a mix of measured actuality and marketing magic – aka FUD (fear, uncertainty, doubt). Time and again, we are thrown a curve ball, like “Oh, your competitor can do this. Can you?” Suddenly you are feeling pinned to a corner, and the pressure to defend your turf rises. You fumbled; You have no answer; Game over!

I experienced these hearty objections many times over. The best experience was one particular meeting I had during my early days with NetApp® in 2000. I was only 1-2 months with the company, still wet between the ears with the technology. I was pitching the SnapMirror® to Ericsson Malaysia when the Scandinavian manager said, “I think you are lying!“. I was lost without a response. I fumbled spectacularly although I couldn’t remember if we won or lost that opportunity.

Here are a few I often encountered. Let’s play the game of What If …?

What If …?

Continue reading →

SSOT of Files

By cfheoh | August 9, 2021 - 9:00 am |August 9, 2021 Analytics, API, Artificial Intelligence, Backup, Business Continuity, BYOD, Cloud, Data, Data Archiving, Data Availability, Data Management, Data Privacy, Data Protection, Data Security, Digital Transformation, Edge Computing, Filesystems, Fog Computing, Google, Google Anthos, Industry 4.0, Machine Learning, Object Storage, Tape storage

Leave a comment

[ This is part two of “Where are your files living now?”. You can read Part One here ]

“Data locality, Data mobility“. It was a term I like to use a lot when describing about data consolidation, leading to my mention about files and folders, and where they live in my previous blog. The thinking of where the files and folders are now as in everywhere as they can be in a plethora of premises stretches the premise of SSOT (Single Source of Truth). And this expatriation of files with minimal checks and balances disturbs me.

A year ago, just before I joined iXsystems, I was given Google® embargoed news, probably a week before they announced BigQuery Omni. Then I was interviewed by Enterprise IT News, a local Malaysian technology news portal to provide an opinion quote. This was what I quoted:

“’The data warehouse in the cloud’ managed services of Big Query is underpinned by Google® Anthos, its hybrid cloud infra and service management platform based on GKE (Google® Kubernetes Engine). The containerised applications, both on-prem and in the multi-clouds, would allow Anthos to secure and orchestrate infra, services and policy management under one roof.”

I further quoted ” The data repositories remain in each cloud is good to address data sovereignty, data security concerns but it did not mention how it addresses “single source of truth” across multi-clouds.”

Single Source of Truth – regardless of repositories

Continue reading →

Where are your files living now?

By cfheoh | August 2, 2021 - 9:00 am |August 1, 2021 Algorithm, Analytics, Appliance, Artificial Intelligence, Backup, Big Data, Business Continuity, BYOD, Cloud, Data Availability, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Digital Transformation, Edge Computing, EMC, Filesystems, Google, Google Anthos, Hyperconvergence, Interica, Machine Learning, Microsoft, Microsoft Azure, NAS, NetApp, WAFS, Wide Area File System

Leave a comment

[ This is Part One of a longer conversation ]

EMC² (before the Dell® acquisition) in the 2000s had a tagline called “Where Information Lives™“^**. This was before the time of cloud storage. The tagline was an adage of enterprise data storage, proper and contemporaneous to the persistent narrative at the time – Data Consolidation. Within the data consolidation stories, thousands of files and folders moved about the networks of the organizations, from servers to clients, clients to servers. NAS (Network Attached Storage) was, and still is the work horse of many, many organizations.

[ **Side story ] There was an internal anti-EMC joke within NetApp® called “Information has a new address”.

EMC tagline “Where Information Lives”

This was a time where there were almost no concerns about Shadow IT; ransomware were less known; and most importantly, almost everyone knew where their files and folders were, more or less (except in Oil & Gas upstream – to be told in later in this blog). That was because there were concerted attempts to consolidate data, and inadvertently files and folders, in the organization.

Even when these organizations were spread across the world, there were distributed file technologies at the time that could deliver files and folders in an acceptable manner. Definitely not as good as what we have today in a cloudy world, but acceptable. I personally worked a project setting up Andrew File Systems for Intel® in Penang in the mid-90s, almost joined Tacit Networks in the mid-2000s, dabbled on Microsoft® Distributed File System with NetApp® and Windows File Servers while fixing the mountains of issues in deploying the worldwide GUSto (Global Unified Storage) Project in Shell 2006. Somewhere in my chronological listings, Acopia Networks (acquired by F5) and of course, EMC² Rainfinity and NetApp® NuView OEM, Virtual File Manager.

The point I am trying to make here is most IT organizations had a good grip of where the files and folders were. I do not think this is very true anymore. Do you know where your files and folders are living today?

Continue reading →

Enterprise Storage is not just a Label

By cfheoh | July 26, 2021 - 9:00 am |July 23, 2021 Appliance, Cloud, Data, Data Archiving, Data Availability, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Digital Transformation, ILM, Jon Toigo, RAID, Reliability, SATA, Security, Software Defined Storage, Unified Storage, Virtualization

3 Comments

I have many anecdotes around the topic of Enterprise Storage, but the conversations in the past 2 weeks made it important for me to share this.

Enterprise Storage is …

Amusing, painful, angry

I get riled up whenever people do not want to be educated about Enterprise Storage. Here are a few that happened in the last 2 weeks.

[ Story #1 ]

A guy was building his own storage for cryptocurrency. He was informed by his supplier that the RAID card was enterprise, and he could get the best performance using “Enterprise” RAID-0.

Well, “Enterprise” RAID-0 volume crashed, and he lost all data. Painfully, he said he lost a hefty sum financially

[ Story #2 ]

A media company complained about the reliability of previous storage vendor. The GM was shopping around and was told that there are “Enterprise” SATA drives and the reliability is as good, if not better than SAS drives.

The company wanted a fully reliable Enterprise Storage system with 99.999% availability, and yet the SATA interface was not meant to build a more highly reliable enterprise storage. The GM insisted to use “Enterprise” SATA drives for his “enterprise” storage system instead of SAS.

[ Story #3 ]

An IT admin of a manufacturing company claimed that they had an “Enterprise Storage” system for a few years, and could not figure out why his hard disk drives would die every 12-15 months.

He figured out that the drives supplied by his vendor were consumer SATA drives, even though he was told it was an “Enterprise Storage” system when he bought the system.

Continue reading →

Rethinking File Security Fundamentals

By cfheoh | May 24, 2021 - 9:00 am |May 24, 2021 Algorithm, Analytics, API, Artificial Intelligence, Business Continuity, Data Availability, Data Corruption, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Digital Transformation, Disaster Recovery, eDiscovery, Filesystems, iRODS, Machine Learning, Object Storage, Snapshots, Virtualization

Leave a comment

I took a week off blogging last week but the lazy days were inundated by bad news. A few more devastating ransomware attacks. This time, Colonial Pipeline in the US was hacked and its networks were shutdown by ransomware. These ransomware threats are never ending, and they are getting more damaging than ever. It is like trying to plug a leaking boat with your hands, and more leaks appear as you plug them.

More ransomware news hitting healthcare around the world last week:

[ May 15, 2021 ] Ireland’s health service hit by ‘significant’ ransomware attack
[ May 20, 2021 ] Irish hospitals are latest to be hit by ransomware attacks
[ May 19, 2021 ] Ransomware attacks hit AXA’s Asia unit, New Zealand health provider
[ May 20, 2021 ] Ransomware attacks are spiking. Is your company prepared?
[ May 20, 2021 ] RansomCloud: It’s new, it’s here now and it’s coming to a server near you

We are forever chasing for a solution, forever losing because almost all technology defenses to protect the data against ransomware are reactive. Why is ransomware still such a big threat then? Time to rethink file security fundamentals.

Data everywhere

Continue reading →

Blasphemous technical writing

By cfheoh | April 12, 2021 - 9:00 am |April 9, 2021 Cloud, Data Management, Disks, Hyperconvergence

1 Comment

This is so, so, so wrong! I want to hold back but I can’t hold back no more!

This article from Petapixel appeared in my daily news feed last week. When I saw the title “Seagate performed best in Backblaze’s 2020 Hard Drive Failure Report“, I literally jumped. My immediate thoughts were “This can’t be right“.

Labelling Seagate as the best performer in a Backblaze report not only sounded oxymoronic. It was moronic. For those of us who have the industry experience, we know enough that this cannot be true with just a one fell swoop statement.

Petapixel misleading article title

Backblaze report

Backblaze has been releasing Hard Drive Stats and Report every quarter since 2013. For many of us practitioners, the report has been the de facto standard and indicator of hard disks reliability. Inadvertently, it defines the quality of the hard disk drives associated with the respective manufacturer’s brand and models.

Continue reading →

The other pandemic – Datanemic

By cfheoh | April 5, 2021 - 9:00 am |April 4, 2021 Algorithm, Analytics, Artificial Intelligence, Business Continuity, Cloud, Data, Data Corruption, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Digital Transformation, Disaster Recovery, IoT, Machine Learning, Uncategorized

Leave a comment

It is a disaster. No matter what we do, the leaks and the cracks are appearing faster than we are fixing it. It is a global pandemic.

I am not talking about COVID-19, the pandemic that has affected our lives and livelihood for over a year. I am talking about the other pandemic – the compromise of security of data.

In the past 6 months, the data leaks, the security hacks, the ransomware scourge have been more devastating than ever. Here are a few big ones that happened on a global scale:

[ Thru 2020 ] Solarwinds Supply Chain Hack (aka Sunburst)
[ March 2021 ] Microsoft® Exchange Hack
[ March 2021 ] Acer® Ransomware Attack
[ April 2021 ] Asteelflash Electronics Ransomware Attack

Data Security Breach, Cyber Attack, Ransomware

Closer to home, here in South East Asia, we have

[ March 2021 ] Malaysia Airlines Data Breach
[ March 2021 ] Singapore Airlines Data Security Breach

Continue reading →

When you buy storage solutions on price alone

By cfheoh | March 1, 2021 - 9:00 am |February 28, 2021 Appliance, Data Management, Data Protection, Data Security, Digital Transformation, Storage Optimization

Leave a comment

Most people won’t bat an eye buying a car. It is a status symbol for many, but the value of the work returned from the car to the cost of buying the car is a great disparity. Furthermore, the price of the car depreciates quickly, making the “investment” more like an act of losing money fast.

So the story begins. When it comes to buying a storage technology platform, the initial price on the quote more or less decide the outcome. The reply of “Too expensive!” with little consideration about the returns of certain values relative to the initial buying price is far too frequent and plenty.

There has to be more considerations about these values. Here are in buying a storage technology platform besides just the initial price.

Performance

One recent conversation was about Intel® Optane™ vs NAND Flash. An well-known online eCommerce proprietor in South East Asia decided to go against the grain, and went for the more “expensive” Optane™ instead of the getting an array of NAND Flash NVMe SSDs.

Continue reading →

Multicloud is sprouting Storage Silos

By cfheoh | February 15, 2021 - 9:00 am |February 14, 2021 Backup, Business Continuity, Cloud, Data, Data Archiving, Data Availability, Data Management, Data Privacy, Data Protection, Data Security, Disaster Recovery

1 Comment

Grain Silos

We get an avalanche of multicloud selling from storage vendors. We get promises and benefits of multicloud but from whose point of view?

Multicloud is multiple premises

This is an overly simplistic example how I created 3 copies of the same spreadsheet yesterday. I have a quotation on Google Sheets. A fairly complicated one. Someone wanted it in Excel format, but the format and the formulas were all messed up when I tried to download it as XLSX. What I had to do was to download the Google Sheets as ODS (OpenDocument Spreadsheet) format to my laptop, and then upload the LibreOffice file to my OneDrive account, and use Excel Online to open the ODS file and saved as XLSX. In one fell swoop, I have the same spreadsheet in Google Drive, my laptop and OneDrive. 3 copies in 3 different premises.

As we look to the behaviour of data creation and data acquisition, data sharing and data movement, the central repository is the gold image, the most relevant copy of the data. However, for business reasons, data has to be moved to where the applications are. It could be in cloud A or cloud B or cloud C or it could be on-premises. The processed output from cloud A is stored in cloud A, and likewise, cloud B in cloud B and so on.

To get the most significant and relevant copy, data from all premises must be consolidated, thus it has to be moved to a centralized data storage repository. But intercloud data movement is bogged down by egress fees, latency, data migration challenges (like formats and encoding), security, data clearance policies and many other hoops and hurdles.

With all these questions and concerns in mind, the big question mark is “Is multicloud really practical?” From a storage guy like me who loves a great data management story, “It is not. Multicloud creates storage silos“.

Continue reading →

Layers in Storage – For better or worse

By cfheoh | February 8, 2021 - 9:00 am |February 6, 2021 API, Appliance, Ceph, CIFS, Cloud, Clusters, Containers, Data Management, Data Protection, Disks, Docker, EMC, Fibre Channel, Filesystems, FreeNAS, Gluster, Hyperconvergence, iSCSI, iXsystems, Kubernetes, Linux, NAS, NetApp, NFS, Object Storage, Openstack, OpenZFS, RAID, SMB, Snapshots, SNIA, Software Defined Storage, Storage Optimization, TrueNAS, Unified Storage, Virtualization, VMware

Leave a comment

Storage arrays and storage services are built upon by layers and layers beneath its architecture. The physical components of hard disk drives and solid states are abstracted into RAID volumes, virtualized into other storage constructs before they are exposed as shares/exports, LUNs or objects to the network.

Everyone in the storage networking industry, is cognizant of the layers and it is the foundation of knowledge and experience. The public cloud storage services side is the same, albeit more opaque. Nevertheless, both have layers.

In the early 2000s, SNIA® Technical Council outlined a blueprint of the SNIA® Shared Storage Model, a framework describing layers and properties of a storage system and its services. It was similar to the OSI 7-layer model for networking. The framework helped many industry professionals and practitioners shaped their understanding and the development of knowledge in their respective fields. The layering scheme of the SNIA® Shared Storage Model is shown below:

SNIA Shared Storage Model – The layering scheme

Storage vendors layering scheme

While SNIA® storage layers were generic and open, each storage vendor had their own proprietary implementation of storage layers. Some of these architectures are simple, but some, I find a bit too complex and convoluted.

Here is an example of the layers of the Automated Volume Management (AVM) architecture of the EMC® Celerra®.

EMC Celerra AVM Layering Scheme

I would often scratch my head about AVM. Disks were grouped into RAID groups, which are LUNs (Logical Unit Numbers). Then they were defined as Celerra® dvols (disk volumes), and stripes of the dvols were consolidated into a storage pool.

From the pool, a piece of a storage capacity construct, called a slice volume, were combined with other slice volumes into a metavolume which eventually was presented as a file system to the network and their respective NAS clients. Explaining this took an effort because I was the IP Storage product manager for EMC® between 2007 – 2009. It was a far cry from the simplicity of NetApp® ONTAP 7 architecture of RAID groups and volumes, and the WAFL® (Write Anywhere File Layout) filesystem.

Another complicated layered framework I often gripe about is Ceph. Here is a look of how the layers of CephFS is constructed.

Ceph Storage Layered Framework

I work with the OpenZFS filesystem a lot. It is something I am rather familiar with, and the layered structure of the ZFS filesystem is essentially simpler.

Storage architecture mixology

Engineers are bizarre when they get too creative. They have a can do attitude that transcends the boundaries of practicality sometimes, and boggles many minds. This is what happens when they have their own mixology ideas.

Recently I spoke to two magnanimous persons who had the idea of providing Ceph iSCSI LUNs to the ZFS filesystem in order to use the simplicity of NAS file sharing capabilities in TrueNAS® CORE. From their own words, Ceph NAS capabilities sucked. I had to draw their whole idea out in a Powerpoint and this is the architecture I got from the conversation.

There are 3 different storage subsystems here just to provide NAS. As if Ceph layers aren’t complicated enough, the iSCSI LUNs from Ceph are presented as Cinder volumes to the KVM hypervisor (or VMware® ESXi) through the Cinder driver. Cinder is the persistent storage volume subsystem of the Openstack® project. The Cinder volumes/hypervisor datastore are virtualized as vdisks to the respective VMs installed with TrueNAS® CORE and OpenZFS filesystem. From the TrueNAS® CORE, shares and exports are provisioned via the SMB and NFS protocols to Windows and Linux respectively.

It works! As I was told, it worked!

A.P.P.A.R.M.S.C. considerations

Continuing from the layered framework described above for NAS, other aspects beside the technical work have to be considered, even when it can work technically.

I often use a set of diligent data storage focal points when considering a good storage design and implementation. This is the A.P.P.A.R.M.S.C. Take for instance Protection as one of the points and snapshot is the technology to use.

Snapshots can be executed at the ZFS level on the TrueNAS® CORE subsystem. Snapshots can be trigged at the volume level in Openstack® subsystem and likewise, rbd snapshots at the Ceph subsystem. The question is, which snapshot at which storage subsystem is the most valuable to the operations and business? Do you run all 3 snapshots? How do you execute them in succession in a scheduled policy?

In terms of performance, can it truly maximize its potential? Can it churn out the best IOPS, and deliver at wire speed? What is the latency we can expect with so many layers from 3 different storage subsystems?

And supporting this said architecture would be a nightmare. Where do you even start the troubleshooting?

Those are just a few considerations and questions to think about when such a layered storage architecture along. IMHO, such a design was over-engineered. I was tempted to say “Just because you can, doesn’t mean you should”

Elegance in Simplicity

Einstein (I think) quoted:

Einstein’s quote on simplicity and complexity

I am not saying that having too many layers is wrong. Having a heavily layered architecture works for many storage solutions out there, where they are often masked with a simple and intuitive UI. But in yours truly point of view, as a storage architecture enthusiast and connoisseur, there is beauty and elegance in simple designs.

The purpose here is to promote better understanding of the storage layers, and how they integrate and interact with each other to deliver the data services to the network. In the end, that is how most storage architectures are built.

Category Archives: Data Management

What If – The other side of Storage FUDs

SSOT of Files

Where are your files living now?