Storage Optimization – Page 2

A Data Management culture to combat Ransomware

By cfheoh | July 10, 2023 - 8:00 am |July 10, 2023 Backup, Business Continuity, Data, Data Archiving, Data Availability, Data Corruption, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Disaster Recovery, EasiShare, ILM, iXsystems, NAS, Reliability, Security, Snapshots, Storage Optimization, Storage Tiering, Tape storage, TrueNAS

1 Comment

On the road, seat belt saves lives. So does the motorcycle helmet. But these 2 technologies alone are probably not well received and well applied daily unless there is a strong ecosystem and culture about road safety. For decades, there have been constant and unrelenting efforts to enforce the habits of putting on the seat belt or the helmet. Statistics have shown they reduce road fatalities, but like I said, it is the safety culture that made all this happen.

On the digital front, the ransomware threats are unabated. In fact, despite organizations (and individuals), both large and small, being more aware of cyber-hygiene practices more than ever, the magnitude of ransomware attacks has multiplied. Threat actors still see weaknesses and gaps, and vulnerabilities in the digital realms, and thus, these are lucrative ventures that compliment the endeavours.

Time to look at Data Management

The Cost-Benefits-Risks Conundrum of Data Management

And I have said this before in the past. At a recent speaking engagement, I brought it up again. I said that ransomware is not a cybersecurity problem. Ransomware is a data management problem. I got blank stares from the crowd.

I get it. It is hard to convince people and companies to embrace a better data management culture. I think about the Cost-Benefits-Risk triangle while I was analyzing the lack of data management culture used in many organizations when combating ransomware.

I get it that Cybersecurity is big business. Even many of the storage guys I know wanted to jump into the cybersecurity bandwagon. Many of the data protection vendors are already mashing their solutions with a cybersecurity twist. That is where the opportunities are, and where the cool kids hang out. I get it.

Cybersecurity technologies are more tangible than data management. I get it when the C-suites like to show off shiny new cybersecurity “toys” because they are allowed to brag. Oh, my company has just implemented security brand XXX, and it’s so cool! They can’t be telling their golf buddies that they have a new data management culture, can they? What’s that?

Continue reading →

How well do you know your data and the storage platform that processes the data

By cfheoh | December 20, 2021 - 7:30 am |December 20, 2021 100Gigabit Ethernet, 10Gigabit Ethernet, Algorithm, Analytics, Appliance, Backup, Big Data, Business Continuity, Cloud, Clusters, Composable Infrastructure, compression, Confluent, Data Archiving, Data Availability, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Deduplication, Digital Transformation, Disaster Recovery, Edge Computing, Filesystems, Hyperconvergence, ILM, Industry 4.0, InfluxDB, iRODS, Machine Learning, NAS, NFS, NVMe, Object Storage, Performance Caching, Pravega, Reliability, SATA, Scale-out architecture, Security, Software Defined Storage, Storage Area Network, Storage Optimization, Storage Tiering, Unified Storage, VDI, Virtualization

Leave a comment

Last week was consumed by many conversations on this topic. I was quite jaded, really. Unfortunately many still take a very simplistic view of all the storage technology, or should I say over-marketing of the storage technology. So much so that the end users make incredible assumptions of the benefits of a storage array or software defined storage platform or even cloud storage. And too often caveats of turning on a feature and tuning a configuration to the max are discarded or neglected. Regards for good storage and data management best practices? What’s that?

I share some of my thoughts handling conversations like these and try to set the right expectations rather than overhype a feature or a function in the data storage services.

Complex data networks and the storage services that serve it

I/O Characteristics

Applications and workloads (A&W) read and write from the data storage services platforms. These could be local DAS (direct access storage), network storage arrays in SAN and NAS, and now objects, or from cloud storage services. Regardless of structured or unstructured data, different A&Ws have different behavioural I/O patterns in accessing data from storage. Therefore storage has to be configured at best to match these patterns, so that it can perform optimally for these A&Ws. Without going into deep details, here are a few to think about:

Random and Sequential patterns
Block sizes of these A&Ws ranging from typically 4K to 1024K.
Causal effects of synchronous and asynchronous I/Os to and from the storage

Continue reading →

The burgeoning world of NVMe

By cfheoh | November 29, 2021 - 8:00 am |November 28, 2021 100Gigabit Ethernet, Cloud, Composable Infrastructure, CXL, Disks, Fibre Channel, IDC, Infiniband, iSCSI, Memory Cloud, NVMe, Performance Benchmark, Performance Caching, Software-defined Datacenter, Solid State Devices, Storage Area Network, Storage Optimization, Virtualization

1 Comment

When I wrote this article “Let’s smoke this storage peace pipe” 5 years ago, I quoted:

“NVMe® and NVM®eF‰, as it evolves, can become the Great Peacemaker and bringing both divides and uniting them into a single storage fabric.“

I envisioned NVMe® and NVMe®oF™ setting the equilibrium at the storage architecture level, finishing the great storage fabric into one. This balance in the storage ecosystem at the storage interface specifications and language-protocol level has rapidly unifying storage today, and we are already seeing the end-to-end NVMe paths directly from the PCIe bus of one host to another, via networks over Ethernet (with RoCE, iWARP, and TCP flavours) and Fibre Channel™. Technically we can have an end point device, example a tablet, talking the same NVMe language to its embedded storage as well as a cloud NVMe storage in an exascale storage far, far away. In the past, there were just too many bridges, links, viaducts, aqueducts, bypasses, tunnels, flyovers to cross just to deliver a storage command, or a data in a formats, encased and encoded (and decoded) in so many different ways.

Colours in equilibrium, like the rainbow

Simple basics of NVMe®

SATA (Serial Attached ATA) and SAS (Serial Attached SCSI) are not optimized for solid state devices. besides legacy stuff like AHCI (Advanced Host Controller Interface) in SATA, and archaic SCSI-3 primitives in SAS, NVM® has so much to offer. It can achieve very high bandwidth and support 65,535 I/O queues, each with a queue depth of 65,535. The queue depth alone is a massive jump compared to SAS which has a queue depth limit of 256.

A big part of this is how NVMe® handles I/O processing. It has a submission queue (SQ) and a completion queue (CQ), and together they are know as a Queue Pair (QP). The NVMe® controller handles tens of thousands at I/Os (reads and writes) simultaneously, alerted to switch between each SQ and CQ very quickly using the MSI or MSI-X interrupt. Think of MSI and MSI-X as a service bell, a hardware register that informs the NVM® controller when there are requests in the SQ, and informs the hosts that there are completed requests in the CQ. There will be plenty of “dings” by the MSI-X service register but the NVMe® controller can perform it very well, with some smart interrupt coalescing.

NVMe I/O processing

NVMe® 1.1, as I recalled, used to be have 3 admin commands and 10 base commands, which made it very lightweight compared to SCSI-3. However, newer commands were added to NVMe® 2.0 specifications included command sets fo key-value operations and zoned named space.

Continue reading →

Storage Elephant Compute Birds

By cfheoh | November 22, 2021 - 8:00 am |November 21, 2021 Actifio, Amazon Web Services, Analytics, API, Artificial Intelligence, Backup, Big Data, Business Continuity, Cloud, Composable Infrastructure, Data, Data Archiving, Data Availability, Data Management, Data Privacy, Data Protection, Data Security, Deduplication, DellEMC, Delphix, Digital Transformation, Disaster Recovery, Edge Computing, Hadoop, Hyperconvergence, IoT, iXsystems, Kubernetes, Machine Learning, NetApp, Nimble Storage, Nutanix, Object Storage, Simplivity, Snapshots, Storage Optimization, Storage Tiering, TrueNAS, Virtualization, VMware

Leave a comment

Data movement is expensive. Not just costs, but also latency and resources as well. Thus there were many narratives to move compute closer to where the data is stored because moving compute is definitely more economical than moving data. I borrowed the analogy of the 2 animals from some old NetApp® slides which depicted storage as the elephant, and compute as birds. It was the perfect analogy, because the storage is heavy and compute is light.

“Close up of a white Great Egret perching on top of an African Elephant aa Amboseli national park, Kenya”

Before the animals representation came about I used to use the term “Data locality, Data Mobility“, because of past work on storage technology in the Oil & Gas subsurface data management pipeline.

Take stock of your data movement

I had recent conversations with an end user who has been paying a lot of dollars keeping their “backup” and “archive” in AWS Glacier. The S3 storage is cheap enough to hold several petabytes of data for years, because the IT folks said that the data in AWS Glacier are for “backup” and “archive”. I put both words in quotes because they were termed as “backup” and “archive” because of their enterprise practice. However, the face of their business is changing. They are in manufacturing, oil and gas downstream, and the definitions of “backup” and “archive” data has changed.

For one, there is a strong demand for reusing the past data for various reasons and these datasets have to be recalled from their cloud storage. Secondly, their data movement activities still mimicked what they did in the past during their enterprise storage days. It was a classic lift-and-shift when they moved to the cloud, and not taking stock of their data movements and the operations they ran on these datasets. Still ongoing, their monthly AWS cost a bomb.

Continue reading →

What the heck is Storage Modernization?

By cfheoh | September 20, 2021 - 7:00 am |September 19, 2021 Acquisition, Analytics, API, Artificial Intelligence, Big Data, Business Continuity, Cloud, Containers, Data, Data Archiving, Data Availability, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Digital Transformation, Disaster Recovery, Edge Computing, Green Computing, Hadoop, Hadoop Clusters, Kubernetes, Machine Learning, MapReduce, Reliability, Software-defined Datacenter, Solid State Devices, Storage Optimization, Tape storage

Leave a comment

We often hear the word “modernization” thrown around these days. The push is to get the end user to refresh their infrastructure, and the storage infrastructure market is rife with modernization word. Is your storage ripe for “modernization“?

Many possibilities to modernize storage

To modernize, it has to be relative to legacy storage hardware, and the operating environment that came with it. But if the so-called “legacy” still does the job, should you modernize?

Big Data is right

When the word “Big Data” came into prominence a while back, it stirred the IT industry into a frenzy. At one point, Apache Hadoop became the poster elephant (pun intended) for this exciting new segment. So many Vs came out, but I settled with 4 Vs as the framework of my IT conversations. The 4Vs we often hear are:

Volume
Velocity
Variety
Veracity

Continue reading →

The future of Fibre Channel in the Cloud Era

By cfheoh | September 13, 2021 - 7:30 am |September 13, 2021 100Gigabit Ethernet, Algorithm, Analytics, API, Appliance, Backup, Big Data, Broadcom, Brocade, Business Continuity, Cloud, Containers, Data, Data Availability, Data Fabric, Data Protection, Data Security, Deep Learning, Digital Transformation, Disaster Recovery, Disks, Edge Computing, FCoE, Fibre Channel, HDS, High Performance Computing, Hitachi Vantara, Hyperconvergence, iSCSI, Kubernetes, Machine Learning, NetApp, NVMe, Performance Benchmark, QLogic, RAID, Reliability, SATA, Scale-out architecture, SNIA, Solid State Devices, Storage Area Network, Storage Optimization, Storage Tiering

3 Comments

The world has pretty much settled that hybrid cloud is the way to go for IT infrastructure services today. Straddled between the enterprise data center and the infrastructure-as-a-service in public cloud offerings, hybrid clouds define the storage ecosystems and architecture of choice.

A recent Blocks & Files article, “Broadcom server-storage connectivity sales down but recovery coming” caught my attention. One segment mentioned that the server-storage connectivity sales was down 9% leading me to think “Is this a blip or is it a signal that Fibre Channel, the venerable SAN (storage area network) protocol is on the wane?”

Fibre Channel Sign

Thus, I am pondering the position of Fibre Channel SANs in the cloud era. Where does it stand now and in the near future? Continue reading →

Memory cloud reality soon?

By cfheoh | May 31, 2021 - 9:00 am |May 29, 2021 100Gigabit Ethernet, Big Switch Networks, Cloud, CXL, Dell, Fibre Channel, High Performance Computing, Intel, Memory Cloud, Performance Caching, Storage Optimization, Virtualization

Leave a comment

The original SAN was not always Storage Area Network. SAN had a twin nomenclature called System Area Network (SAN) back in the late 90s. Fibre Channel fabric topology (THE Storage Area Network) was only starting to take off when many of the Fibre Channel topologies at the time were either FC-AL (Fibre Channel Arbitrated Loop) or Point-to-Point. So, for a while SAN was System Area Network, or at least that was what Microsoft® wanted it to be. That SAN obviously did not take off.

System Area Network (architecture shown below) presented a high speed network where server clusters can communicate. The communication protocol of choice was VIA (Virtual Interface Adapter), and the proposed applications, notably the Microsoft® SQL Server, would use Winsock API to interface with the network services. Cache coherency in the combined memory resources of a clustered network is often the technology to ensure data synchronization, consistency and integrity.

Alas, System Area Network did not truly take off, and now it is pretty much deprecated from the Microsoft® universe.

System Area Network (SAN)

Continue reading →

Is Software Defined right for Storage?

By cfheoh | April 19, 2021 - 9:00 am |April 18, 2021 Acquisition, API, Appliance, ATA over Ethernet, Ceph, Composable Infrastructure, Coraid, Cray Inc, Data Direct Networks, Deduplication, deduplication, Filesystems, FreeNAS, Gluster, HDS, High Performance Computing, HPE Simplivity, Hyperconvergence, Infiniband, Intel, iXsystems, Liqid, Mellanox Technologies, Minio, NetApp, Nexenta, Nutanix, nVidia, OpenIO, Openstack, Oracle, PCIe, Performance Benchmark, Performance Caching, Pure Storage, RAID, RDMA, Redhat, Simplivity, SNIA, SoftIron, Software Defined Storage, Solaris, Solid State Devices, Storage Optimization, TrueNAS, Vast Data, Virtualization

Leave a comment

George Herbert Leigh Mallory, mountaineer extraordinaire, was once asked “Why did you want to climb Mount Everest?“, in which he replied “Because it’s there“. That retort demonstrated the indomitable human spirit and probably exemplified best the relationship between the human being’s desire to conquer the physical limits of nature. The software of humanity versus the hardware of the planet Earth.

Juxtaposing, similarities can be said between software and hardware in computer systems, in storage technology per se. In it, there are a few schools of thoughts when it comes to delivering storage services with the notable ones being the storage appliance model and the software-defined storage model.

There are arguments, of course. Some are genuinely partisan but many a times, these arguments come in the form of the flavour of the moment. I have experienced in my past companies touting the storage appliance model very strongly in the beginning, and only to be switching to a “software company” chorus years after that. That was what I meant about the “flavour of the moment”.

Software Defined Storage

Continue reading →

A FreeNAS Compression Tale

By cfheoh | March 22, 2021 - 9:00 am |March 22, 2021 Algorithm, compression, deduplication, Deduplication, Delphix, Filesystems, FreeNAS, iXsystems, NetApp, OpenZFS, Storage Optimization, TrueNAS

1 Comment

David vs Goliath Credit: Miguel Robledo of https://www.artstation.com/miguel_robledo

David vs Goliath

It was an underdog tale worthy of the biblical book of Samuel. When I first caught wind of how FreeNAS™ compression prowess was going against NetApp® compression and deduplication in one use case, I had to find out more. And the results in this use case was quite impressive considering that FreeNAS™ (now known as TrueNAS® CORE) is the free, open source storage operating system and NetApp® Data ONTAP, is the industry leading, enterprise, “king of the hill” storage data management software.

Certainly a David vs Goliath story.

Compression in FreeNAS

Ah, Compression! That technology that is often hidden, hardly seen and often forgotten.

Compression is a feature within FreeNAS™ that seldom gets the attention. It works, and certainly is a mature form of data footprint reduction (DFR) technology, along with data deduplication. It is switched on by default, and is the setting when creating a dataset, as shown below:

Dataset creation with Compression (lz4) turned on

The default compression algorithm is lz4 which is fast but poor in compression ratio compared to gzip and bzip2. However, lz4 uses less CPU cycles to perform its compression and decompression processing, and thus the impact on FreeNAS™ and TrueNAS® is very low.

Quick Benchmark: Gzip vs Bzip2 vs LZMA vs XZ vs LZ4 vs LZO

NetApp® ONTAP, if I am not wrong, uses lzopro as default – a commercial and optimized version of the open source LZO compression library. In addition, NetApp also has their data deduplication technology as well, something OpenZFS has to improve upon in the future.

The DFR report

This brings us to the use case at one of iXsystems™ customers in Taiwan. The data to be reduced are mostly log files at the end user, and the version of FreeNAS™ is 11.2u7. There are, of course, many factors that affect the data reduction ratio, but in this case of 4 scenarios, the end user has been running this in production for over 2 months. The results:

FreeNAS vs NetApp Data Footprint Reduction

In 2 of the 4 scenarios, FreeNAS™ performed admirably with just the default lz4 compression alone, compared to NetApp® which was running both their inline compression and deduplication.

The intention to post this report is not to show that FreeNAS™ is better in every case. It won’t be, and there are superior data footprint reduction tech out there which can outperform it. But I would expect potential and existing end users to leverage on the compression capability of FreeNAS™ which is getting better all the time.

A better compression algorithm

Followers of OpenZFS are aware of the changing of times with OpenZFS version 2.0. One exciting update is the introduction of the zstd compression algorithm into OpenZFS late last year, and is already in TrueNAS® CORE and Enterprise version 12.x.

What is zstd? zstd is a fast compression algorithm that aims to be as efficient (or better) than gzip, but with better speed closer to lz4, relatively. For a long time, the gzip compression algorithm, from levels 1-9, has been serving very good compression ratio compared to many compression algorithms, lz4 included.

However, the efficiency came at a higher processing price and thus took a longer time. At the other end, lz4 is fast and lightweight, but its reduction ratio efficiency is very poor. zstd intends to be the in-between of gzip and lz4. In the latest results published by Facebook’s github page,

zstd performance benchmark against other compression algorithms

For comparison, zstd (level -1) performed very well against zlib, the data compression library in gzip. It was made known there are 22 levels of compression in zstd but I do not know how many levels are accepted in the OpenZFS development.

At the same time, compression takes advantage of multi-core processing, and actually can speed up disk I/O response because the original dataset to be processed is smaller after the compression reduction.

While TrueNAS® still defaults lz4 compression as of now, you can probably change the default compression with a command

# zfs set compression=zstd-6 pool/dataset

Your choice

TrueNAS® and FreeNAS™ support multiple compression algorithms. lz4, gzip and now zstd. That gives the administrator a choice to assign the right compression algorithm based on processing power, storage savings, and time to get the best out of the data stored in the datasets.

As far as the David vs Goliath tale goes, this real life use case was indeed a good one to share.

When you buy storage solutions on price alone

By cfheoh | March 1, 2021 - 9:00 am |February 28, 2021 Appliance, Data Management, Data Protection, Data Security, Digital Transformation, Storage Optimization

Leave a comment

Most people won’t bat an eye buying a car. It is a status symbol for many, but the value of the work returned from the car to the cost of buying the car is a great disparity. Furthermore, the price of the car depreciates quickly, making the “investment” more like an act of losing money fast.

So the story begins. When it comes to buying a storage technology platform, the initial price on the quote more or less decide the outcome. The reply of “Too expensive!” with little consideration about the returns of certain values relative to the initial buying price is far too frequent and plenty.

There has to be more considerations about these values. Here are in buying a storage technology platform besides just the initial price.

Performance

One recent conversation was about Intel® Optane™ vs NAND Flash. An well-known online eCommerce proprietor in South East Asia decided to go against the grain, and went for the more “expensive” Optane™ instead of the getting an array of NAND Flash NVMe SSDs.

Continue reading →

Category Archives: Storage Optimization

A Data Management culture to combat Ransomware

The Cost-Benefits-Risks Conundrum of Data Management