Discovery of the 8th element – Element R

I am so blind. After more than 20 years in the industry, I have chosen to be blind to one of the most important elements of data protection and availability. Yet, I have been talking about it over and over, and over again but never really incorporated it into mantra.

Some readers will know that I frequently use these 7 points (or elements) in my approach to storage infrastructure and information management. These are:

  • Availability
  • Performance
  • Protection
  • Accessibility
  • Management
  • Security
  • Compliance

A few days ago, I had an epiphany. I woke up in the morning, feeling so enlightened and yet conflicted with the dumbfounded dumb feeling. It was so weird, and that moment continued to play in my mind like a broken record. I had to let it out and hence I am writing this down now.

Element RRecovery, Resiliency, Restorability, Resumption. That’s the element which I “discovered“. I was positively stunned that I never incorporated such an important element in my mantra, until now. Continue reading

MASSive, Impressive, Agile, TEGILE

Ah, my first blog after Storage Field Day 6!

It was a fantastic week and I only got to fathom the sensations and effects of the trip after my return from San Jose, California last week. Many thanks to Stephen Foskett (@sfoskett), Tom Hollingsworth (@networkingnerd) and Claire Chaplais (@cchaplais) of Gestalt IT for inviting me over for that wonderful trip 2 weeks’ ago. Tegile was one of the companies I had the privilege to visit and savour.

In a world of utterly confusing messaging about Flash Storage, I was eager to find out what makes Tegile tick at the Storage Field Day session. Yes, I loved Tegile and the campus visit was very nice. I was also very impressed that they have more than 700 customers and over a thousand systems shipped, all within 2 years since they came out of stealth in 2012. However, I was more interested in the essence of Tegile and what makes them stand out.

I have been a long time admirer of ZFS (Zettabyte File System). I have been a practitioner myself and I also studied the file system architecture and data structure some years back, when NetApp and Sun were involved in a lawsuit. A lot of have changed since then and I am very pleased to see Tegile doing great things with ZFS.

Tegile’s architecture is called IntelliFlash. Here’s a look at the overview of the IntelliFlash architecture:

Tegile IntelliFlash Architecture

So, what stands out for Tegile? I deduce that there are 3 important technology components that defines Tegile IntelliFlash ™ Operating System.

  • MASS (Metadata Accelerator Storage System)
  • Media Management
  • Inline Compression and Inline Deduplication

What is MASS? Tegile has patented MASS as an architecture that allows optimized data path to the file system metadata.

Often a typical file system metadata are stored together with the data. This results in a less optimized data access because both the data and metadata are given the same priority. However, Tegile’s MASS writes and stores the filesystem metadata in very high speed, low latency DRAM and Flash SSD. The filesystem metadata probably includes some very fine grained and intimate details about the mapping of blocks and pages to the respective capacity Flash SSDs and the mechanical HDDs. (Note: I made an educated guess here and I would be happy if someone corrected me)

Going a bit deeper, the DRAM in the Tegile hybrid storage array is used as a L1 Read Cache, while Flash SSDs are used as a L2 Read and Write Cache. Tegile takes further consideration that the Flash SSDs used for this caching purpose are different from the denser and higher capacity Flash SSDs used for storing data. These Flash SSDs for caching are obviously the faster, lower latency type of eMLCs and in the future, might be replaced by PCIe Flash optimized by NVMe.

Tegile DRAM-Flash Caching

This approach gives absolute priority, and near-instant access to the filesystem’s metadata, making the Tegile data access incredibly fast and efficient.

Tegile’s Media Management capabilities excite me. This is because it treats every single Flash SSD in the storage array with very precise organization of 3 types of data patterns.

  1. Write caching, which is high I/O is focused on a small segment of the drive
  2. Metadata caching, which has both Read and Write I/O  is targeted to a slight larger segment of the drive
  3. Data is laid out on the rest of the capacity of the drive

Drilling deeper, the write caching (in item 1 above) high I/O writes are targeted at the drive segment’s range which is over-provisioned for greater efficiency and care. At the same time, the garbage collection(GC) of this segment is handled by the respective drive’s controller. This is important because the controller will be performing the GC function without inducing unnecessary latency to the storage array processing cycles, giving further boost to Tegile’s already awesome prowess.

In addition to that, IntelliFlash ™ aligns every block and every page exactly to each segment and each page boundary of the drives. This reduces block and page segmentation, and thereby reduces issues with file locality and free blocks locality. It also automatically adjust its block and page alignments to different drive types and models. Therefore, I believe, it would know how to align itself to a 512-bytes or a 520-bytes sector drives.

The Media Management function also has advanced cell care. The wear-leveling takes on a newer level of advancement where how the efficient organization of blocks and pages to the drives reduces additional and often unnecessary erase and rewrites. Furthermore, the use of Inline Compression and Inline Deduplication also reduces the number of writes to drives media, increasing their longevity.

Tegile Inline Compression and Deduplication

Compression and deduplication are 2 very important technology features in almost all flash arrays. Likewise, these 2 technologies are crucial in the performance of Tegile storage systems. They are both inline i.e – Inline Compression and Inline Deduplication, and therefore both are boosted by the multi-core CPUs as well as the fast DRAM memory.

I don’t have the secret sauce formula of how Tegile designed their inline compression and deduplication. But there’s a very good article of how Tegile viewed their method of data reduction for compression and deduplication. Check out their blog here.

The metadata of data access of each and every customer is probably feeding into their Intellicare, a cloud-based customer care program. Intellicare is another a strong differentiator in Tegile’s offering.

Oh, did I mentioned they are unified storage as well with both SAN and NAS, including SMB 3.0 support?

I left Tegile that afternoon on November 5th feeling happy. I was pleased to catch up with Narayan Venkat, my old friend from NetApp, who is now their Chief Marketing Officer. I was equally pleased to see Tegile advancing ZFS further than the others I have known. With so much technological advancement and more coming, the world is their oyster.

Technology prowess of Riverbed SteelFusion

The Riverbed SteelFusion (aka Granite) impressed me the moment it was introduced to me 2 years ago. I remembered that genius light bulb moment well, in December 2012 to be exact, and it had left its mark on me. Like I said last week in my previous blog, the SteelFusion technology is unique in the industry so far and has differentiated itself from its WAN optimization competitors.

To further understand the ability of Riverbed SteelFusion, a deeper inspection of the technology is essential. I am fortunate to be given the opportunity to learn more about SteelFusion’s technology and here I am, sharing what I have learned.

What does the technology of SteelFusion do?

Riverbed SteelFusion takes SAN volumes from supported storage vendors in the central datacenter and projects the storage volumes (aka LUNs)to applications and hosts at the remote branches. The technology requires a paired relationship between SteelFusion Core (in the centralized datacenter) and SteelFusion Edge (at the branch). Both SteelFusion Core and Edge are fronted respectively by the Riverbed SteelHead WAN optimization device, to deliver the performance required.

The diagram below gives an overview of how the entire SteelFusion network architecture is like:

Riverbed SteelFusion Overall Solution 2 Continue reading

APIs that stick in Storage

The competition in storage networking and data management is forever going to get fiercer. And there is always going to be the question of either having open standards APIs or proprietary APIs because storage networking and data management technologies constantly have to balance between gaining a competitive advantage with proprietary APIs  or getting greater market acceptance with open standards APIs.

The flip side, is having proprietary APIs could limit and stunt the growth of the solution but with much better integration and interoperability with complementary solutions. Open standards APIs could make the entire market a plain, vanilla one where there is little difference between technology A or B or C or X, and in the long run, could give lesser incentive for technology innovation.

I am not an API guy. I do not code or do development work on APIs, but I do like APIs (Application Programming Interface). I have my fair share of APIs which can be considered open or proprietary depending on who you talk to. My understanding is that an API might be more open if there are many ISVs, developers and industry supporters endorsing it and have a valid (and usually profit-related) agenda to make the API open.

I can share some work experience with some APIs I have either worked in the past or give my views of some present cool APIs that are related to storage networking and data management.

One of the API-related works I did was with the EMC Centera. I was working with Schlumberger to create a file-level archiving/lifecycle management solution for the GeoFrame seismic files with the EMC Centera. This was back in 2008.

EMC Centera does not present itself as a NAS box (even though I believe, IDC lumps Centera sales numbers to worldwide NAS market figures, unless I am no longer correct chronologically) but rather through ISVs and application-level integration with the EMC Centera API. Here’s a high-level look of how the EMC Centera talks to application with the API.

Note: EMC Centera can also present a NAS integration interface through NFS, CIFS, HTTP and FTP protocols, but the customer must involve (may have to purchase) the EMC Centera Universal Access software appliance. This is for applications that do not have the level of development and integration to interface with the EMC Centera API. 

Continue reading

Chink in NetApp MetroCluster?

Ok, let me clear the air about the word “Chink” (before I get into trouble), which is not racially offensive unlike the news about ESPN having to fire 2 of their employees for using the word “Chink” on Jeremy Lin.  According to my dictionary (Collins COBUILD), chink is a very narrow crack or opening on a surface and I don’t really know the derogatory meaning of “chink” other than the one in my dictionary.

I have been doing a spot of work for a friend who has just recently proposed NetApp MetroCluster. When I was at NetApp many years ago, I did not have a chance to get to know more about the solution, but I do know of its capability. After 6 years away, coming back to do a bit of NetApp was fun for me, because I was always very comfortable with the NetApp technology. But NetApp MetroCluster, and in this opportunity, NetApp Fabric MetroCluster presented me an opportunity to get closer to the technology.

I have no doubt in my mind, this is one of the highest available storage solutions in the market, and NetApp is not modest about beating its own drums. It touts “No SPOF (Single Point of Failure“, and rightly so, because it has put in all the right plugs for all the points that can fail.

NetApp Fabric MetroCluster is a continuous availability solution that stretches over 100km. It is basically a NetApp Cluster with mirrored storage but with half of  its infrastructure mirror being linked very far apart, over Fibre Channel components and dark fiber. Here’s a diagram of how NetApp Fabric Metrocluster works for a VMware FT (Fault Tolerant) environment.

There’s a lot of simplicity in the design, because when I started explaining it to the prospect, I was amazed how easy it was to articulate about it, without all the fancy technical jargons or fuzz. I just said … “imagine a typical cluster, with an interconnect heartbeat, and the storage are mirrored. Then imagine the 2 halves are being pulled very far apart … That’s NetApp Fabric MetroCluster”. It was simply blissful.

But then there were a lot of FUDs (fear, uncertainty, doubt) thrown in by the competitor, feeding the prospect with plenty of ammunition. Yes, I agree with some of the limitations, such as no SATA support for now. But then again, there is no perfect storage solution. In fact, Chris Mellor of The Register wrote about God’s box, the perfect storage, but to get to that level, be prepared to spend lots and lots of money! Furthermore, once you fix one limitation or bottleneck in one part of the storage, it introduces a few more challenges here and there. It’s never ending!

Side note: The conversation triggered the team to check with NetApp for SATA support in Fabric MetroCluster. Yes, it is already supported in ONTAP 8.1 and the present version is 8.1RC3. Yes, SATA support will be here soon. 

More FUDs as we went along and when I was doing my research, some HP storage guys on the web were hitting at NetApp MetroCluster. Poor HP! If you do a search of NetApp MetroCluster, I am sure you will come across these 2 HP blogs in 2010, deriding the MetroCluster solution. Check out this and the followup on the first blog. What these guys chose to do was to break the MetroCluster apart into 2 single controllers after a network failure, and attack it from that level.

Yes, when you break up the halves, it is basically a NetApp system with several single point of failure (SPOF). But then again, who isn’t? Almost every vendor’s storage will have some SPOFs when you break the mirror.

Well, I can tell you is, the weakness of NetApp MetroCluster is, it’s not continuous data protection (CDP). Once your applications have written garbage on one volume, the garbage is reflected on the mirrored volume. You can’t roll back and you live with the data corruption. That is why storage vendors, including NetApp, offer snapshots – point-in-time copies where you can roll back to the point before the data corruption occurred. That is why CDP gives the complete granularity of recovery in every write I/O and that’s something NetApp does not have. That’s NetApp’s MetroCluster weakness.

But CDP is aimed towards data recovery, NOT data availability. It is focused on customers’ whose requirements are ability to get the data back to some usable state or form after the event of a disaster (big or small), while the MetroCluster solution is focused on having the data available all the time. They are 2 different set of requirements. So, it depends on what the customer’s requirement is.

Then again, come to think of it, NetApp has no CDP technology of their own … isn’t it?

Amazon makes it easy

I like the way Amazon is building their Cloud Computing services. Amazon Web Services (AWS) is certainly on track to become the most powerful Cloud Computing company in the world. In fact, AWS might already is.  But they are certainly not resting on their laurels when they launched 2 new services in as many weeks – Amazon DynamoDB (last week) and Amazon Storage Gateway (this week).

I am particularly interested in the Amazon Storage Gateway, because it is addressing one of the biggest fears of Cloud Computing head-on. A lot of large corporations are still adamant to keep their data on-premise where it is private and secure. Many large corporations are still very skeptical about it even though Cloud Computing is changing the IT landscape in a massive way. The barrier to entry for large corporations is not something easy, but Amazon is adapting to get more IT divisions and departments to try out Cloud Computing in a less disruptive way.

The new service, is really about data storage and data backup for large corporations. This is important because large corporations have plenty of requirements for data storage and data to be backed up. And as we know, a large portion of the data stored does not need to be transactional or to be accessed frequently. This set of data is usually less frequently used, for archiving or regulatory compliance reasons, particular in the banking and healthcare industry.

In the data backup operations, the reason data is backed up is to provide a data recovery mechanism when a disaster strikes. Large corporations back up tons of data every day, weeks or month and this data only has value when there is a situation that requires data relevance, data immediacy or data recovery. Otherwise, it is just plenty of data taking up storage space, be it on disk or on tape.

Both data storage and data backup cost a lot of money, both CAPEX and OPEX. In CAPEX, you are constantly pressured to buy more storage to store the ever growing data. This leads to greater management and administration costs, both contributing heavily into OPEX costs. And I have not included the OPEX costs of floor space, power and cooling, people (training, salary, time and so on) typically adding up to 3-5x the operations costs relative to the capital investments. Such a model of IT operations related to storage cannot continue forever, and storage in the Cloud offers an alternative.

These 2 scenarios – data storage and data backup – are exactly the type of market AWS is targeting. In order to simplify and pacify large corporations, AWS introduced the Amazon Storage Gateway, that eases the large corporations to take some of their IT storage operations to the Cloud in the form of Amazon S3.

The video below shows the Amazon Storage Gateway:

The Amazon Storage Gateway is a piece of software “appliance” that is installed on-premise in the large corporation’s data center. It seamlessly integrates into the LAN and provides a SSL (Secure Socket Layer) connection to the Amazon S3. The data being transferred to the S3 is also encrypted with AES (Advanced Encryption Standard) 256-bit. Both SSL and AES-256 can give customers a sense of security and AWS claims that the implementation meets the data storage and data recovery standards used in the banking and healthcare industries.

The data storage and backup service regularly protects the customer’s data in snapshots, and giving the customer a rapid recovery platform should the customer experienced on-premise data corruption or data disruption. At the same time, the snapshot copies in the Amazon S3 can also be uploaded into Amazon EBS (Elastic Block Store) and testing or development environments can be evaluated and testing with Amazon EC2 (Elastic Compute Cloud). The simplicity of sharing and combining different Amazon services will no doubt, give customers a peace of mind, easing their adoption of Cloud Computing with AWS.

This new service starts with a 60-day free trial and moving on to a USD$125.00 (about Malaysian Ringgit $400.00) per gateway per month subscription fee. The data storage (inclusive of the backup service), costs only 14 cents per gigabyte per month. For 1TB of data, that is approximately MYR$450 per month. Therefore, minus the initial setup costs, that comes to a total of MYR$850 per month, slightly over MYR$10,000 per year.

At this point, I like to relate an experience I had a year ago when implementing a so-called private cloud for an oil-and-gas customers in KL. They were using the HP EVS (Electronic Vaulting Service) to an undisclosed HP data center hosting site in the Klang Valley. The HP EVS, which was an OEM of Asigra, was not an easy solution to implement but what was more perplexing was the fact that the customer had a poor understanding of what would be the objectives and their 5-year plan in keeping with the data protected.

When the first 3-4TB data storage and backup were almost used up, the customer asked for a quotation for an additional 1TB of the EVS solution. The subscription for 1TB was MYR$70,000 per year. That is 7x time more than the AWS MYR$10,000 per year cost! I have to salute the HP sales rep. It must have been a damn good convincing sell!

In the long run, the customer could be better off running their storage and backup on-premise with their HP EVA4400 and adding an additional of 1TB (and hiring another IT administrator) would have cost a whole lot less.

Amazon Web Services has already operating in Singapore for the past 2 years, and I am sure they are eyeing Malaysia as their regional market. Unless and until Malaysian companies offering Cloud Services know to use economies-of-scale to capitalize the Cloud Computing market, AWS is always going to be a big threat to CSP companies in Malaysia and a boon of any companies seeking cloud computing services anywhere in the world.

I urge customers in Malaysia to start questioning their so-called Cloud Service Providers if they can do what AWS is doing. I have low confidence of what the most local “cloud computing” companies can deliver right now. I hope they stop window dressing their service offerings and start giving real cloud computing services to customers. And for customers, you must continue to research and find out more which cloud services meet your business objectives. Don’t be flashed by the fancy jargons or technical idealism thrown at you. Always, always find out more because your business cost is at stake. Don’t be like the customer who paid MYR$70,000 for 1TB per year.

AWS is always innovating and the Amazon Storage Gateway is just another easy-to-adopt step in their quest for world domination.

Snapshots? Don’t have a C-O-W about it!

Unfortunately, I am having a COW about it!

Snapshots are the inherent offspring of the copy-on-write technique used in shadow-paging filesystems. NetApp’s WAFL and Oracle Solaris ZFS are commercial implementations of shadow-paging filesystems and they are typically promoted as Copy-on-Write filesystems.

As we may already know, snapshots are point-in-time copy of the active file system in the storage world. They perform quick backup of the active file system by making a copy of the block addresses (pointers) of the filesystem and then updating the pointer maps to the inodes in the fsinfo root inode of the WAFL filesystem for new changes after the snapshot has been taken. The equivalent of fsinfo is the uberblock in the ZFS filesystem.

However, contrary to popular belief, the snapshots from WAFL and ZFS are not copy-on-write implementations even though the shadow paging filesystem tree employs the copy-on-write technique.

Consider this for a while when a snapshot is being taken … Copy —- On —- Write. If the definition is (1) Copy then (2) Write, this means that there are several several steps to perform a copy-on-write snapshot. The filesystem has to to make a copy of the original data block (1 x Read I/O), then write the original data block to a new location (1 x Write I/O) and then write the new data block to the location of the original data block (1 x Write I/O).

This is a 3-step process that can be summarized as

  1. Read location of original data block (1 x Read I/O)
  2. Copy this data block to new unused location (1 x Write I/O)
  3. Write the new and modified data block to the location of original data block (1 x Write I/O)

This implementation, IS THE copy-on-write technique for snapshot but NetApp and possibly Oracle guys have been saying for years that their snapshots are based on copy-on-write. This is pretty much a misnomer that needs to be corrected. EMC, in its SnapSure and SnapView implementation, called this technique Copy-on-First-Write (COFW), probably to avoid the confusion. The data blocks are copied to a savvol, a separate location to store the changes of snapshots and defaults to 10% of the total capacity of their storage solutions.

As you have seen, this method is a 3 x I/O operation and it is an expensive solution. Therefore, when we compare the speed of NetApp/ZFS snapshots to EMC’s snapshots, the EMC COFW snapshot technique will be a tad slower.

However, this method has one superior advantage over the NetApp/ZFS snapshot technique. The data blocks in the active filesystem are almost always laid out in a more contiguous fashion, resulting in a more consistent read performance throughout the life of the active file system.

Below is a diagram of how copy-on-write snapshots are implemented:


What is NetApp/ZFS’s snapshot method then?

It is is known as Redirect-on-Write. Using the same step … REDIRECT —- ON —– WRITE. When a data block is about to be modified, the original data block is read (1 x Read I/O) and then the data block is written to a new location (1 x Write I/O). The active file system then updates the filesystem tree and its inode address to reflect the location of the new data block. The original data block remained unchanged.

In summary,

  1. Read location of original data block (1 x Read I/O)
  2. Write modified data block to new location (1 x Write I/O)

The Redirect-on-Write method resulted in 1 Write I/O less, making snapshot creation faster. This is the NetApp/ZFS method and it is superior when compared to the Copy-on-Write snapshot technique discussed earlier.

However, as the life of the filesystem progresses, fragmentation and holes will cause the performance of the active filesystem to degrade. The reason is most related data blocks are no longer contiguous and the active file system will be busy seeking the scattered data blocks across the volume. Fragmented filesystem would have to be “cleaned and reorganized” to regain its performance lustre.

Another unwanted problem using the Redirect-on-Write snapshot technique is the snapshot resides in the same boundary as the active filesystem. Over time, if the capacity consumed by the snapshots could overwhelm the active filesystem, if their recycle schedule is unchecked.

I guess this is a case of “SUFFER NOW/ENJOY LATER” or “ENJOY NOW/SUFFER LATER”. We have to make a conscious effort to understand what snapshots are all about.

Can snapshots replace traditional backups?

Backup is necessary evil. In IT, every operator, administrator, engineer, manager, and C-level executive knows that you got to have backup. When it comes to the protection of data and information in a business, backup is the only way.

Backup has also become the bane of IT operations. Every product that is out there in the market is trying to cram as much production data to backup as possible just to fit into the backup window. We only have 24 hours in a day, so there is no way the backup window can be increased unless

  • You reduce the size of the primary data to be backed up – think compression, deduplication, archiving
  • You replicate the primary data to a secondary device and backup the secondary device – which is ironic because when you replicate, you are creating a copy of the primary data, which technically is a backup. So you are technically backing up a backup
  • You speed up the transfer of primary data to the backup device

Either way, the IT operations is trying to overcome the challenges of the backup window. And the whole purpose for backup is to be cock-sure that data can be restored when it comes to recovery. It’s like insurance. You pay for the premium so that you are able to use the insurance facility to recover during the times of need. We have heard that analogy many times before.

On the flip side of the coin, a snapshot is also a backup. Snapshots are point-in-time copies of the primary data and many a times, snapshots are taken and then used as the source of a “true” backup to a secondary device, be it disk-based or tape-based. However, snapshots have suffered the perception that it is a pseudo-backup, until recent last couple of years.

Here are some food for thoughts …

WHAT IF we eliminate backing data to a secondary device?

WHAT IF the IT operations is ready to embrace snapshots as the true backup?

WHAT IF we rely on snapshots for backup and replicated snapshots for disaster recovery?

First of all, it will solve the perennial issues of backup to a “secondary device”. The operative word here is the “secondary device”, because that secondary device is usually external to the primary storage.

Tape subsystems and tape are constantly being ridiculed as the culprit of missing backup windows. Duplications after duplications of the same set of files in every backup set triggered the adoption of deduplication solutions from Data Domain, Avamar, PureDisk, ExaGrid, Quantum and so on. Networks are also blamed because network backup runs through the LAN. LANless backup will use another conduit, usually Fibre Channel, to transport data to the secondary device.

If we eliminate the “secondary device” and perform backup in the primary storage itself, then networks are no longer part of the backup. There is no need for deduplication because the data could already have been deduplicated and compressed in the primary storage.

Note that what I have suggested is to backup, compress and dedupe, AND also restore from the primary storage. There is no secondary storage device for backup, compress, dedupe and restore.

Wouldn’t that paint a better way of doing backup?

Snapshots will be the only mechanism to backup. Snapshots are quick, usually in minutes and some in seconds. Most snapshot implementations today are space efficient, consuming storage only for delta changes. The primary device will compress and dedupe, depending on the data’s characteristics.

For DR, snapshots are shipped to a remote storage of equal prowess at the DR site, where the snapshot can be rebuild and be in a ready mode to become primary data when required. NetApp SnapVault is one example. ZFS snapshot replication is another.

And when it comes to recovery, quick restores of primary data will be from snapshots. If the primary storage goes down, clients and host initiators can be rerouted quickly to the DR device for services to resume.

I believe with the convergence of multi-core processing power, 10GbE networks, SSDs, very large capacity drives, we could be seeing a shift in the backup design model and possible the entire IT landscape. Snapshots could very likely replace traditional backup in the near future, and secondary device may be a thing of the past.