Fibre Channel Protocol in a Zero Trust world

Fibre Channel SANs (storage area networks) are touted as more secure than IP-based storage networks. In a way, that is true because Fibre Channel is a distinct network separated from the mainstream client-based applications. Moreover, the Fibre Channel protocol is entirely different from IP, and the deep understanding of the protocol, its implementations are exclusive to a selected cohort of practitioners and professionals in the storage technology industry.

The data landscape has changed significantly compared to the days where FC SANs were dominating the enterprise. The era was the mid 90s and early 2000s. EMC® was king; IBM® Shark was a top-tier predator; NetApp® was just getting over its WAFL™ NAS overdose to jump into Fibre Channel. There were other fishes in the Fibre Channel sea.

But the sands of storage networking have been shifting. Today, data is at the center of the universe. Data is the prized possession of every organization, and has also become the most coveted prize for data thieves, threat actors and other malefactors in the digital world. The Fibre Channel protocol has been changing too, under its revised specifications and implementations through its newer iterations in the past decade. This change in advancement of Fibre Channel as a storage networking protocol is less often mentioned, but nevertheless vital in the shift of the Fibre Channel SANs into a Zero Trust world.

Zones, masks and maps

Many storage practitioners are familiar with the type of security measures employed by Fibre Channel in the yesteryears. And this still rings true in many of the FC SANs that we know of today. For specific devices to connect to each other, from hosts to the storage LUNs (logical unit numbers), FC zoning must be configured. This could be hard zoning or soft zoning, where the concept involves segmentation and the grouping of configured FC ports of both the ends to “see” each other and to communicate, facilitated by the FC switches. These ports are either the initiators or the storage target, each with its own unique WWN (World Wide Name).

On top of zoning, storage practitioners also configure LUN masking at the host side, where only certain assigned LUNs from the storage array is “exposed” to the specific host initiators. In conjunction, at the storage array side, the LUNs are also associated to only a group of host initiators that are allowed to connect to the selected LUNs. This is the LUN mapping part.

Continue reading

Truthful information under attack. The call for Data Preservation

The slogan of The Washington Post is “Democracy Dies in Darkness“. Although not everyone agrees with the US brand of democracy, the altruism of WaPo‘s (the publication’s informal name) slogan is a powerful one. The venerable newspaper remains the beacon in the US as one of the most trustworthy sources of truthful, honest information.

4 Horsemen of Apocalypse with the 5th joining

Misinformation

Misinformation has become a clear and present danger to humanity. Fake news, misleading information, lies are fueling and propelling the propaganda and agenda of the powerful (and the deranged). Facts are blurred, obfuscated, and even removed and replaced with misinformation to push for the undesirable effects that will affect the present and future generations.

The work of SNIA®

Data preservation is part of Data Management. More than a decade ago, SNIA® has already set up a technical work group (TWG) on Long Term Retention and proposed a format for long-term storage of digital format. It was called SIRF (Self-contained Information Retention Format). In the words of SNIA®, “The SIRF format enables long-term physical storage, cloud storage and tape-based containers effective and efficient ways to preserve and secure digital information for many decades, even with the ever-changing technology landscape.”

I don’t think battling misinformation was SNIA®’s original intent, but the requirements for a vendor-neutral organization as such to present and promote long term data preservation is more needed than ever. The need to protect the truth is paramount.

SNIA® continues to work with many organizations to create and grow the ecosystem for long term information retention and data preservation.

NFTs can save data

Despite the hullabaloo of NFTs (non-fungible tokens), which is very much soiled and discredited by the present day cryptocurrency speculations, I view data (and metadata) preservation as a strong use case for NFTs. The action is to digitalize data into an NFT asset.

Here are a few arguments:

  1. NFTs are unique. Once they are verified and inserted into the blockchain, they are immutable. They cannot be modified, and each blockchain transaction is created with one never to be replicated hashed value.
  2. NFTs are decentralized. Most of the NFTs we know of today are minted via a decentralized process. This means that the powerful cannot (most of the time), effect the NFTs state according to its whims and fancies. Unless the perpetrators know how to manipulate a Sybil attack on the blockchain.
  3. NFTs are secure. I have to set the knowledge that NFTs in itself is mostly very secure. Most of the high profiled incidents related to NFTs are more of internal authentication vulnerabilities and phishing related to poor security housekeeping and hygiene of the participants.
  4. NFTs represent authenticity. The digital certification of the NFTs as a data asset also define the ownership and the originality as well. The record of provenance is present and accounted for.

Since NFTs started as a technology to prove the assets and artifacts of the creative industry, there are already a few organizations that playing the role. Orygin Art is one that I found intriguing. Museums are also beginning to explore the potential of NFTs including validating and verifying the origins of many historical artifacts, and digitizing these physical assets to preserve its value forever.

The technology behind NFTs are not without its weaknesses as well but knowing what we know today, the potential is evident and power of the technology has yet to be explored fully. It does present a strong case in preserving the integrity of truthful data, and the data as historical artifacts.

Protect data safety and data integrity

Misinformation is damaging. Regardless if we believe the Butterfly Effect or not, misinformation can cause a ripple effect that could turn into a tidal wave. We need to uphold the sanctity of Truth, and continue to protect data safety and data integrity. The world is already damaged, and it will be damaged even more if we allow misinformation to permeate into the fabric of the global societies. We may welcome to a dystopian future, unfortunately.

This blog hopes to shake up the nonchalant state that we view “information” and “misinformation” today. There is a famous quote that said “Repeat a lie often enough and it becomes the truth“. We must lead the call to combat misinformation. What we do now will shape the generations of our present and future. Preserve Truth.

WaPo “Democracy Dies in Darkness”

[ Condolence: Japan Prime Minister, Shinzo Abe, was assassinated last week. News sources mentioned that the man who killed him had information that the slain PM has ties to a religious group that bankrupted his mother. Misinformation may played a role in the killing of the Japanese leader. ]

Object Storage becoming storage lingua franca of Edge-Core-Cloud

Data Fabric was a big buzzword going back several years. I wrote a piece talking about Data Fabric, mostly NetApp®’s,  almost 7 years ago, which I titled “The Transcendence of Data Fabric“. Regardless of storage brands and technology platforms, and each has its own version and interpretations, one thing holds true. There must be a one layer of Data Singularity. But this is easier said than done.

Fast forward to present. The latest buzzword is Edge-to-Core-Cloud or Cloud-to-Core-Edge. The proliferation of Cloud Computing services, has spawned beyond to multiclouds, superclouds and of course, to Edge Computing. Data is reaching to so many premises everywhere, and like water, data has found its way.

Edge-to-Core-to-Cloud (Gratitude thanks to https://www.techtalkthai.com/dell-technologies-opens-iot-solutions-division-and-introduces-distributed-core-architecture/)

The question on my mind is can we have a single storage platform to serve the Edge-to-Core-to-Cloud paradigm? Is there a storage technology which can be the seamless singularity of data? 7+ years onwards since my Data Fabric blog, The answer is obvious. Object Storage.

The ubiquitous object storage and the S3 access protocol

For a storage technology that was initially labeled “cheap and deep”, object storage has become immensely popular with developers, cloud storage providers and is fast becoming storage repositories for data connectors. I wrote a piece called “All the Sources and Sinks going to Object Storage” over a month back, which aptly articulate how far this technology has come.

But unknown to many (Google NASD and little is found), object storage started its presence in SNIA (it was developed in Carnegie-Mellon University prior to that) in the early 90s, then known as NASD (network attached secure disk). As it is made its way into the ANSI T10 INCITS standards development, it became known as Object-based Storage Device or OSD.

The introduction of object storage services 16+ years ago by Amazon Web Services (AWS) via their Simple Storage Services (S3) further strengthened the march of object storage, solidified its status as a top tier storage platform. It was to AWS’ genius to put the REST API over HTTP/HTTPS with its game changing approach to use CRUD (create, retrieve, update, delete) operations to work with object storage. Hence the S3 protocol, which has become the de facto access protocol to object storage.

Yes, I wrote those 2 blogs 11 and 9 years ago respectively because I saw that object storage technology was a natural fit to the burgeoning new world of storage computing. It has since come true many times over.

Continue reading

Computational Storage embodies Data Velocity and Locality

I have been earnestly observing the growth of Computational Storage for a number of years now.  It was known by several previous names, with the name “in-situ data processing” stuck with me the most. The Computational Storage nomenclature became more cohesive when SNIA® put together the CMSI (Compute Memory Storage Initiative) some time back. This initiative is where several standards bodies, the major technology players and several SIGs (special interest groups) in SNIA® collaborated to advance Computational Storage segment in the storage technology industry we know of today.

The use cases for Computational Storage are burgeoning, and the functional implementations of Computational Storage are becoming vital to tackle the explosive data tsunami. In 2018 IDC, in its Worldwide Global Datasphere Forecast 2021-2025 report, predicted that the world will have 175 ZB (zettabytes) of data. That number, according to hearsay, has been revised to a heady figure of 250ZB, given the superlative rate data is being originated, spawned and more.

Computational Storage driving factors

If we take the Computer Science definition of in-situ processing, Computational Storage can be distilled as processing data where it resides. In a nutshell, “Bring Compute closer to Storage“. This means that there is a processing unit within the storage subsystem which does not require the host CPU to perform processing. In a very simplistic manner, a RAID card in a storage array can be considered a Computational Storage device because it performs the RAID functions instead of the host CPU. But this new generation of Computational Storage has much more prowess than just the RAID function in a RAID card.

There are many factors in Computational Storage that make a lot sense. Here are a few:

  1. Voluminous data inundate the centralized architecture of the cloud platforms and the enterprise systems today. Much of the data come from end point devices – mobile devices, sensors, IoT, point-of-sales, video cameras, et.al. Pre-processing the data at the origin data points can help filter the data, reduce the size to be processed centrally, and secure the data before they are ingested into the central data processing systems
  2. Real-time processing of the data at the moment the data is received gives the opportunity to create the Velocity of Data Analytics. Much of the data do not need to move to a central data processing system for analysis. Often in use cases like autonomous vehicles, fraud detection, recommendation systems, disaster alerts etc require near instantaneous responses. Performing early data analytics at the data origin point has tremendous advantages.
  3. Moore’s Law is waning. The CPU (central processing unit) is no longer the center of the universe. We are beginning to see CPU offloading technologies to augment the CPU’s duties such as compression, encryption, transcoding and more. SmartNICs, DPUs (data processing units), VPUs (visual processing units), GPUs (graphics processing units), etc have come forth to formulate a new computing paradigm.
  4. Freeing up central resources with Computational Storage also accelerates the overall distributed data processing in the whole data architecture. The CPU and the adjoining memory subsystem are less required to perform context switching caused by I/O interrupts as in most of the compute/storage architecture today. The total effect relieves the CPU and giving back more CPU cycles to perform higher processing tasks, resulting in faster performance overall.
  5. The rise of memory interconnects is enabling a more distributed computing fabric of data processing subsystems. The rising CXL (Compute Express Link™) interconnect protocol, especially after the Gen-Z annex, has emerged a force to be reckoned with. This rise of memory interconnects will likely strengthen the testimony of Computational Storage in the fast approaching future.

Computational Storage Deployment Models

SNIA Computational Storage Universe in 2019

Continue reading

What happened to NDMP?

The acronym NDMP shows up once in a while in NAS (Network Attached Storage) upgrade tenders. And for the less informed, NDMP (Network Data Management Protocol) was one of the early NAS data management (more like data mover specifications) initiatives to backup NAS devices, especially the NAS appliances that run proprietary operating systems code.

NDMP Logo

Backup software vendors often have agents developed specifically for an operating system or an operating environment. But back in the mid-1990s, 2000s, the internal file structures of these proprietary vendors were less exposed, making it harder for backup vendors to develop agents for them. Furthermore, there was a need to simplify the data movements of NAS files between backup servers and the NAS as a client, to the media servers and eventually to the tape or disk targets. The dominant network at the time ran at 100Mbits/sec.

To overcome this, Network Appliance® and PDC Solutions/Legato® developed the NDMP protocol, allowing proprietary NAS devices to run a standardized client-server architecture with the NDMP server daemon in the NAS and the backup service running as an NDMP client. Here is a simplified look at the NDMP architecture.

NDMP Client-Server Architecture

Continue reading

Open Source Storage Technology Crafters

The conversation often starts with a challenge. “What’s so great about open source storage technology?

For the casual end users of storage systems, regardless of SAN (definitely not Fibre Channel) or NAS on-premises, or getting “files” from the personal cloud storage like Dropbox, OneDrive et al., there is a strong presumption that open source storage technology is cheap and flaky. This is not helped with the diet of consumer brands of NAS in the market, where the price is cheap, but the storage offering with capabilities, reliability and performance are found to be wanting. Thus this notion floats its way to the business and enterprise users, and often ended up with a negative perception of open source storage technology.

Highway Signpost with Open Source wording

Storage Assemblers

Anybody can “build” a storage system with open source storage software. Put the software together with any commodity x86 server, and it can function with the basic storage services. Most open source storage software can do the job pretty well. However, once the completed storage technology is put together, can it do the job well enough to serve a business critical end user? I have plenty of sob stories from end users I have spoken to in these many years in the industry related to so-called “enterprise” storage vendors. I wrote a few blogs in the past that related to these sad situations:

We have such storage offerings rigged with cybersecurity risks and holes too. In a recent Unit 42 report, 250,000 NAS devices are vulnerable and exposed to the public Internet. The brands in question are mentioned in the report.

I would categorize these as storage assemblers.

Continue reading

The future of Fibre Channel in the Cloud Era

The world has pretty much settled that hybrid cloud is the way to go for IT infrastructure services today. Straddled between the enterprise data center and the infrastructure-as-a-service in public cloud offerings, hybrid clouds define the storage ecosystems and architecture of choice.

A recent Blocks & Files article, “Broadcom server-storage connectivity sales down but recovery coming” caught my attention. One segment mentioned that the server-storage connectivity sales was down 9% leading me to think “Is this a blip or is it a signal that Fibre Channel, the venerable SAN (storage area network) protocol is on the wane?

Fibre Channel Sign

Thus, I am pondering the position of Fibre Channel SANs in the cloud era. Where does it stand now and in the near future? Continue reading

Is Software Defined right for Storage?

George Herbert Leigh Mallory, mountaineer extraordinaire, was once asked “Why did you want to climb Mount Everest?“, in which he replied “Because it’s there“. That retort demonstrated the indomitable human spirit and probably exemplified best the relationship between the human being’s desire to conquer the physical limits of nature. The software of humanity versus the hardware of the planet Earth.

Juxtaposing, similarities can be said between software and hardware in computer systems, in storage technology per se. In it, there are a few schools of thoughts when it comes to delivering storage services with the notable ones being the storage appliance model and the software-defined storage model.

There are arguments, of course. Some are genuinely partisan but many a times, these arguments come in the form of the flavour of the moment. I have experienced in my past companies touting the storage appliance model very strongly in the beginning, and only to be switching to a “software company” chorus years after that. That was what I meant about the “flavour of the moment”.

Software Defined Storage

Continue reading

Layers in Storage – For better or worse

Storage arrays and storage services are built upon by layers and layers beneath its architecture. The physical components of hard disk drives and solid states are abstracted into RAID volumes, virtualized into other storage constructs before they are exposed as shares/exports, LUNs or objects to the network.

Everyone in the storage networking industry, is cognizant of the layers and it is the foundation of knowledge and experience. The public cloud storage services side is the same, albeit more opaque. Nevertheless, both have layers.

In the early 2000s, SNIA® Technical Council outlined a blueprint of the SNIA® Shared Storage Model, a framework describing layers and properties of a storage system and its services. It was similar to the OSI 7-layer model for networking. The framework helped many industry professionals and practitioners shaped their understanding and the development of knowledge in their respective fields. The layering scheme of the SNIA® Shared Storage Model is shown below:

SNIA Shared Storage Model – The layering scheme

Storage vendors layering scheme

While SNIA® storage layers were generic and open, each storage vendor had their own proprietary implementation of storage layers. Some of these architectures are simple, but some, I find a bit too complex and convoluted.

Here is an example of the layers of the Automated Volume Management (AVM) architecture of the EMC® Celerra®.

EMC Celerra AVM Layering Scheme

I would often scratch my head about AVM. Disks were grouped into RAID groups, which are LUNs (Logical Unit Numbers). Then they were defined as Celerra® dvols (disk volumes), and stripes of the dvols were consolidated into a storage pool.

From the pool, a piece of a storage capacity construct, called a slice volume, were combined with other slice volumes into a metavolume which eventually was presented as a file system to the network and their respective NAS clients. Explaining this took an effort because I was the IP Storage product manager for EMC® between 2007 – 2009. It was a far cry from the simplicity of NetApp® ONTAP 7 architecture of RAID groups and volumes, and the WAFL® (Write Anywhere File Layout) filesystem.

Another complicated layered framework I often gripe about is Ceph. Here is a look of how the layers of CephFS is constructed.

Ceph Storage Layered Framework

I work with the OpenZFS filesystem a lot. It is something I am rather familiar with, and the layered structure of the ZFS filesystem is essentially simpler.

Storage architecture mixology

Engineers are bizarre when they get too creative. They have a can do attitude that transcends the boundaries of practicality sometimes, and boggles many minds. This is what happens when they have their own mixology ideas.

Recently I spoke to two magnanimous persons who had the idea of providing Ceph iSCSI LUNs to the ZFS filesystem in order to use the simplicity of NAS file sharing capabilities in TrueNAS® CORE. From their own words, Ceph NAS capabilities sucked. I had to draw their whole idea out in a Powerpoint and this is the architecture I got from the conversation.

There are 3 different storage subsystems here just to provide NAS. As if Ceph layers aren’t complicated enough, the iSCSI LUNs from Ceph are presented as Cinder volumes to the KVM hypervisor (or VMware® ESXi) through the Cinder driver. Cinder is the persistent storage volume subsystem of the Openstack® project. The Cinder volumes/hypervisor datastore are virtualized as vdisks to the respective VMs installed with TrueNAS® CORE and OpenZFS filesystem. From the TrueNAS® CORE, shares and exports are provisioned via the SMB and NFS protocols to Windows and Linux respectively.

It works! As I was told, it worked!

A.P.P.A.R.M.S.C. considerations

Continuing from the layered framework described above for NAS, other aspects beside the technical work have to be considered, even when it can work technically.

I often use a set of diligent data storage focal points when considering a good storage design and implementation. This is the A.P.P.A.R.M.S.C. Take for instance Protection as one of the points and snapshot is the technology to use.

Snapshots can be executed at the ZFS level on the TrueNAS® CORE subsystem. Snapshots can be trigged at the volume level in Openstack® subsystem and likewise, rbd snapshots at the Ceph subsystem. The question is, which snapshot at which storage subsystem is the most valuable to the operations and business? Do you run all 3 snapshots? How do you execute them in succession in a scheduled policy?

In terms of performance, can it truly maximize its potential? Can it churn out the best IOPS, and deliver at wire speed? What is the latency we can expect with so many layers from 3 different storage subsystems?

And supporting this said architecture would be a nightmare. Where do you even start the troubleshooting?

Those are just a few considerations and questions to think about when such a layered storage architecture along. IMHO, such a design was over-engineered. I was tempted to say “Just because you can, doesn’t mean you should

Elegance in Simplicity

Einstein (I think) quoted:

Einstein’s quote on simplicity and complexity

I am not saying that having too many layers is wrong. Having a heavily layered architecture works for many storage solutions out there, where they are often masked with a simple and intuitive UI. But in yours truly point of view, as a storage architecture enthusiast and connoisseur, there is beauty and elegance in simple designs.

The purpose here is to promote better understanding of the storage layers, and how they integrate and interact with each other to deliver the data services to the network. In the end, that is how most storage architectures are built.

 

Storageless shan’t be thy name

Storageless??? What kind of a tech jargon is that???

This latest jargon irked me. Storage vendor NetApp® (through its acquisition of Spot) and Hammerspace, a metadata-driven storage agnostic orchestration technology company, have begun touting the “storageless” tech jargon in hope that it will become an industry buzzword. Once again, the hype cycle jargon junkies are hard at work.

Clear, empty storage containers

Clear, nondescript storage containers

It is obvious that the storageless jargon wants to ride on the hype of serverless computing, an abstraction method of computing resources where the allocation and the consumption of resources are defined by pieces of programmatic code of the running application. The “calling” of the underlying resources are based on the application’s code, and thus, rendering the computing resources invisible, insignificant and not sexy.

My stand

Among the 3 main infrastructure technology – compute, network, storage, storage technology is a bit of a science and a bit of dark magic. It is complex and that is what makes storage technology so beautiful. The constant innovation and technology advancement continue to make storage as a data services platform relentlessly interesting.

Cloud, Kubernetes and many data-as-a-service platforms require strong persistent storage. As defined by NIST Definition of Cloud Computing, the 4 of the 5 tenets – on-demand self-service, resource pooling, rapid elasticity, measured servicedemand storage to be abstracted. Therefore, I am all for abstraction of storage resources from the data services platform.

But the storageless jargon is doing a great disservice. It is not helping. It does not lend its weight glorifying the innovations of storage. In fact, IMHO, it felt like a weighted anchor sinking storage into the deepest depth, invisible, insignificant and not sexy. I am here dutifully to promote and evangelize storage innovations, and I am duly unimpressed with such a jargon.

Continue reading