Crash consistent data recovery for ZFS volumes

While TrueNAS® CORE and TrueNAS® Enterprise are more well known for its NAS (network attached storage) prowess, many organizations are also confidently placing their enterprise applications such as hypervisors and databases on TrueNAS® via SANs (storage area networks) as well. Both iSCSI and Fibre Channel™ (selected TrueNAS® Enterprise storage models) protocols are supported well.

To reliably protect these block-based applications via the SAN protocols, ZFS snapshot is the key technology that can be dependent upon to restore the enterprise applications quickly. However, there are still some confusions when it comes to the state of recovery from the ZFS snapshots. On that matter, this situations are not unique to the ZFS environments because as with many other storage technologies, the confusion often stem from the (mis)understanding of the consistency state of the data in the backups and in the snapshots.

Crash Consistency vs Application Consistency

To dispel this misunderstanding, we must first begin with the understanding of a generic filesystem agnostic snapshot. It is a point-in-time copy, just like a data copy on the tape or in the disks or in the cloud backup. It is a complete image of the data and the state of the data at the storage layer at the time the storage snapshot was taken. This means that the data and metadata in this snapshot copy/version has a consistent state at that point in time. This state is frozen for this particular snapshot version, and therefore it is often labeled as “crash consistent“.

In the event of a subsystem (application, compute, storage, rack, site, etc) failure or a power loss, data recovery can be initiated using the last known “crash consistent” state, i.e. restoring from the last good backup or snapshot copy. Depending on applications, operating systems, hypervisors, filesystems and the subsystems (journals, transaction logs, protocol resiliency primitives etc) that are aligned with them, some workloads will just continue from where it stopped. It may already have some recovery mechanisms or these workloads can accept data loss without data corruption and inconsistencies.

Some applications, especially databases, are more sensitive to data and state consistencies. That is because of how these applications are designed. Take for instance, the Oracle® database. When an Oracle® database instance is online, there is an SGA (system global area) which handles all the running mechanics of the database. SGA exists in the memory of the compute along with transaction logs, tablespaces, and open files that represent the Oracle® database instance. From time to time, often measured in seconds, the state of the Oracle® instance and the data it is processing have to be synched to non-volatile, persistent storage. This commit is important to ensure the integrity of the data at all times.

Continue reading

Ransomware recovery with TrueNAS ZFS snapshots

This is really an excuse to install and play around with TrueNAS® CORE 12.0.

I had a few “self assigned homework exercises” I have to do this weekend. I was planning to do a video webcast with an EFSS vendor soon, and the theme should be around ransomware. Then one of the iXsystems™ resellers, unrelated to the first exercise, was talking about this ransomware messaging yesterday after we did a technical training with them. And this weekend is coming on a bit light as well. So I thought I could bring all these things, including checking out the TrueNAS® CORE 12.0, together in a video (using Free Cam), of which I would do for the first time as well. WOW! I can kill 4 birds with one stone! All together in one blog!

It could be Adam Brown 89 or worse

Trust me. You do not want AdamBrown89 as your friend. Or his thousands of ransomware friends.

When (not if) you are infected by ransomware, you get a friendly message like this in the screenshot below. I got this from a local company who asked for my help a few months ago.

AdamBrown89 ransomware message

AdamBrown89 ransomware message

I have written about this before. NAS (Network Attached Storage) has become a gold mine for ransomware attackers, and many entry level NAS products are heavily inflicted with security flaws and vulnerabilities. Here are a few notable articles in year 2020 alone. [ Note: This has been my journal of the security flaws of NAS devices from 2020 onwards ]

Continue reading

Playing with NetApp … final usable capacity

This is the third and last blog entry of how do we get the ONTAP final capacity.

In my first blog, we ran through a gamut of explanations how disk rightsizing came about for NetApp’s ONTAP. And the importance of disk rightsizing is to give ONTAP a level set of disks, regardless of manufacturer, model, make, firmware versions and so on, and ONTAP is pretty damn sure that the disks that it gets will not mess up.

In my second blog, progressing from the disk rightsizing stage, was the RAID group sizing stage, where different RAID group size affected the number of disks used for data and for parity in an aggregate. An aggregate, for the uninformed, is the disks pool in which the flexible volume, FlexVol, is derived. In a simple picture below,

OK, the diagram’s in Japanese (I am feeling a bit cheeky today :P)!

But it does look a bit self explanatory with some help which I shall provide now. If you start from the bottom of the picture, 16 x 300GB disks are combined together to create a RAID Group. And there are 4 RAID Groups created – rg0, rg1, rg2 and rg3. These RAID groups make up the ONTAP data structure called an aggregate. From ONTAP version 7.3 onward, there were some minor changes of how ONTAP reports capacity but fundamentally, it did not change much from previous versions of ONTAP. And also note that ONTAP takes a 10% overhead of the aggregate for its own use.

With the aggregate, the logical structure called the FlexVol is created. FlexVol can be as small as several megabytes to as large as 100TB, incremental by any size on-the-fly. This logical structure also allow shrinking of the capacity of the volume online and on-the-fly as well. Eventually, the volumes created from the aggregate become the next-building blocks of NetApp NFS and CIFS volumes and also LUNs for iSCSI and Fibre Channel. Also note that, for a more effective organization of logical structures from the volumes, using qtree is highly recommended for files and ONTAP management reasons.

However, for both aggregate and the FlexVol volumes created from the aggregate, snapshot reserve is recommended. The aggregate takes a 5% overhead of the capacity for snapshot reserve, while for every FlexVol volume, a 20% snapshot reserve is applied. While both snapshot percentage are adjustable, it is recommended to keep them as best practice (except for FlexVol volumes assigned for LUNs, which could be adjusted to 0%)

Note: Even if the snapshot reserve is adjusted to 0%, there are still some other rule sets for these LUNs that will further reduce the capacity. When dealing with NetApp engineers or pre-sales, ask them about space reservations and how they do snapshots for fat LUNs and thin LUNs and their best practices in these situations. Believe me, if you don’t ask, you will be very surprised of the final usable capacity allocated to your applications)

In a nutshell, the dissection of capacity after the aggregate would look like the picture below:

 

We can easily quantify the overall usable in the little formula that I use for some time:

Rightsized Disks capacity x # Disks x 0.90 x 0.95 = Total Aggregate Usable Capacity

Then remember that each volume takes a 20% snapshot reserve overhead. That’s what you have got to play with when it comes to the final usable capacity.

Though the capacity is not 100% accurate because there are many variables in play but it gives the customer a way to manually calculate their potential final usable capacity.

Please note the following best practices and this is only applied to 1 data aggregate only. For more aggregates, the same formula has to be applied again.

  1. A RAID-DP, 3-disk rootvol0, for the root volume is set aside and is not accounted for in usable capacity
  2. A rule-of-thumb of 2-disks hot spares is applied for every 30 disks
  3. The default RAID Group size is used, depending on the type of disk profile used
  4. Snapshot reserves default of 5% for aggregate and 20% per FlexVol volumes are applied
  5. Snapshots for LUNs are subjected to space reservation, either full or fractional. Note that there are considerations of 2x + delta and 1x + delta (ask your NetApp engineer) for iSCSI and Fibre Channel LUNs, even though snapshot reserves are adjusted to 0% and snapshots are likely to be turned off.

Another note that remember is not to use any of those Capacity Calculators given. These calculators are designed to give advantage to NetApp, not necessarily to the customer. Therefore, it is best to calculate these things by hand.

Regardless of how the customer will get as the overall final usable capacity, it is the importance to understand the NetApp philosophy of doing things. While we have perhaps, went overboard explaining the usable capacity and the nitty gritty that comes with it, all these things are done for a reason to ensure simplicity and ease of navigating data management in the storage networking world. Other NetApp solutions such as SnapMirror and SnapVault and also the SnapManager suite of product rely heavily on this.

And the intangible benefits of NetApp and ONTAP definitely have moved NetApp forward since its early years, into what NetApp is today, a formidable storage juggernaut.

Can snapshots replace traditional backups?

Backup is necessary evil. In IT, every operator, administrator, engineer, manager, and C-level executive knows that you got to have backup. When it comes to the protection of data and information in a business, backup is the only way.

Backup has also become the bane of IT operations. Every product that is out there in the market is trying to cram as much production data to backup as possible just to fit into the backup window. We only have 24 hours in a day, so there is no way the backup window can be increased unless

  • You reduce the size of the primary data to be backed up – think compression, deduplication, archiving
  • You replicate the primary data to a secondary device and backup the secondary device – which is ironic because when you replicate, you are creating a copy of the primary data, which technically is a backup. So you are technically backing up a backup
  • You speed up the transfer of primary data to the backup device

Either way, the IT operations is trying to overcome the challenges of the backup window. And the whole purpose for backup is to be cock-sure that data can be restored when it comes to recovery. It’s like insurance. You pay for the premium so that you are able to use the insurance facility to recover during the times of need. We have heard that analogy many times before.

On the flip side of the coin, a snapshot is also a backup. Snapshots are point-in-time copies of the primary data and many a times, snapshots are taken and then used as the source of a “true” backup to a secondary device, be it disk-based or tape-based. However, snapshots have suffered the perception that it is a pseudo-backup, until recent last couple of years.

Here are some food for thoughts …

WHAT IF we eliminate backing data to a secondary device?

WHAT IF the IT operations is ready to embrace snapshots as the true backup?

WHAT IF we rely on snapshots for backup and replicated snapshots for disaster recovery?

First of all, it will solve the perennial issues of backup to a “secondary device”. The operative word here is the “secondary device”, because that secondary device is usually external to the primary storage.

Tape subsystems and tape are constantly being ridiculed as the culprit of missing backup windows. Duplications after duplications of the same set of files in every backup set triggered the adoption of deduplication solutions from Data Domain, Avamar, PureDisk, ExaGrid, Quantum and so on. Networks are also blamed because network backup runs through the LAN. LANless backup will use another conduit, usually Fibre Channel, to transport data to the secondary device.

If we eliminate the “secondary device” and perform backup in the primary storage itself, then networks are no longer part of the backup. There is no need for deduplication because the data could already have been deduplicated and compressed in the primary storage.

Note that what I have suggested is to backup, compress and dedupe, AND also restore from the primary storage. There is no secondary storage device for backup, compress, dedupe and restore.

Wouldn’t that paint a better way of doing backup?

Snapshots will be the only mechanism to backup. Snapshots are quick, usually in minutes and some in seconds. Most snapshot implementations today are space efficient, consuming storage only for delta changes. The primary device will compress and dedupe, depending on the data’s characteristics.

For DR, snapshots are shipped to a remote storage of equal prowess at the DR site, where the snapshot can be rebuild and be in a ready mode to become primary data when required. NetApp SnapVault is one example. ZFS snapshot replication is another.

And when it comes to recovery, quick restores of primary data will be from snapshots. If the primary storage goes down, clients and host initiators can be rerouted quickly to the DR device for services to resume.

I believe with the convergence of multi-core processing power, 10GbE networks, SSDs, very large capacity drives, we could be seeing a shift in the backup design model and possible the entire IT landscape. Snapshots could very likely replace traditional backup in the near future, and secondary device may be a thing of the past.