SMB on steroids but CIFS lord isn’t pleased

I admit it!

I am one of the guilty parties who continues to use CIFS (Common Internet File System) to represent the Windows file sharing protocol. And a lot of vendors continue to use the “CIFS” word loosely without knowing that it was a something from a bygone era. One of my friends even pronounced it as “See Fist“, which sounded even funnier when he said it. (This is for you Adrian M!)

And we couldn’t be more wrong because we shouldn’t be using the CIFS word anymore. It is so 90’s man! And the tell-tale signs have already been there but most of us chose to ignore it with gusto. But a recent SNIA Webinar titled “SMB 3.0 – New opportunities for Windows Environment” aims to dispel our incompetence and change our CIFS-venture to the correct word – SMB (Server Message Block).

A selfie photo of Dennis Chapman, Senior Technical Director for Microsoft Solutions at NetApp from the SNIA webinar slides above, wants to inform all of us that … SMB History (more…)

VMware in step 1 breaking big 6 hegemony

Happy Lunar New Year! This is the Year of the Water Snake, which just commenced 3 days ago.

I have always maintain that VMware has to power to become a storage killer. I mentioned that it was a silent storage killer in my blog post many moons ago.

And this week, VMware is not so silent anymore. Earlier this week, VMware had just acquired Virsto, a storage hypervisor technology company. News of the acquisition are plentiful on the web and can be found here and here. VMware is seriously pursuing its “Software-Defined Data Center (SDDC)” agenda and having completed its software-defined networking component with the acquisition of Nicira back in July 2012, the acquisition of Virsto represents another bedrock component of SDDC, software-defined storage.

Who is Virsto and what do they do? Well, in a nutshell, they abstract the underlying storage architecture and presents a single, global namespace for storage, a big storage pool for VM datastores. I got to know about their presence last year, when I was researching on the topic of storage virtualization.

I was looking at Datacore first, because I was familiar with Datacore. I got to know Roni Putra, Datacore’s CTO, through a mutual friend, when he was back in Malaysia. There was a sense of pride knowing that Roni is a Malaysian. That was back in 2004. But Datacore isn’t the only player in the game, because the market is teeming with folks like Tintri, Nutanix, IBM, HDS and many more. It just so happens that Virsto has caught the eye of VMware as it embarks its first high-profile step (the one that VMware actually steps on the toes of the Storage Big 6 literally) into the storage game. The Big 6 are EMC, NetApp, IBM, HP, HDS and Dell (maybe I should include Fujitsu as well, since it has been taking market share of late)

Virsto installs as a VSA (virtual storage appliance) into ESXi, and in version 2.0, it plugs right in as an almost-native feature of ESXi, not a vCenter tab like most other storage. It looks and feels very much like a vSphere functionality and this blurs the lines of storage and VM management. To the vSphere administrator, the only time it needs to be involved in storage administration is when he/she is provisioning storage or expanding it. Those are the only 2 common “touch-points” that a vSphere administrator has to deal with storage. This, therefore, simplifies the administration and management job.

Here’s a look at the Virsto Storage Hypervisor architecture (credits to Google Images):

What Virsto does, as I understand from high-level, is to take any commodity storage and provides a virtual storage layer and consolidate them into a very large storage pool. The storage pool is called vSpace (previously known as LiveSpace?) and “allocates” Virsto vDisks to each VMs. Each Visto vDisk will look like a native zeroed thick VMDK, with the space efficiency of Linked Clones, but without the performance penalty of provisioning them.  The Virsto vDisks are presented as NFS exports to each VM.

Another important component is the asynchronous write to Virsto vLogs. This is configured at the deployment stage, and this is basically a software-based write cache, quickly acknowledging all writes for write optimization and in the background, asynchronously de-staged to the vSpace. Obviously it will have its own “secret sauce” to optimize the writes.

Within the vSpace, as disk clone groups internal to the Virsto, storage related features such as tiering, thin provisioning, cloning and snapshots are part and parcel of it. Other strong features of Virsto are its workflow wizard in storage provisioning, and its intuitive built-in performance and management console.

As with most technology acquisitions, the company will eventually come to a fork where they have to decide which way to go. VMware has experienced it before with its Nicira acquisition. It had to decide between VxLAN (an IETF standard popularized by Cisco) or Nicira’s own STT (Stateless Transport Tunneling). There is no clear winner because choosing one over the other will have its rewards and losses.

Likewise, the Virsto acquisition will have to be packaged in a friendly manner by VMware. It does not want to step on all toes of its storage Big 6 partners (yet). It still has to abide to some industry “co-opetition” game rules but it has started the ball rolling.

And I see that 2 critical disruptive points about this acquisition in this:

  1. It has endorsed the software-defined storage/storage hypervisor/storage virtualization technology and started the commodity storage hardware technology wave. This could the beginning of the end of proprietary storage hardware. This is also helped by other factors such as the Open Compute Project by Facebook. Read my blog post here.
  2. It is pushing VMware into a monopoly ala-Microsoft of the yesteryear. But this time around, Microsoft Hyper-V could be the benefactor of the VMware agenda. No wonder VMware needs to restructure and streamline its business. News of VMware laying off about 900 staff can be read here. Its unfavourable news of its shares going down can be read here.

I am sure the Storage Big 6 is on the alert and is probably already building other technology and partnerships beyond VMware. It the natural thing to do but there is no stopping VMware if it wants to step on the Big 6 toes now!

Protogon File System

I was out shopping yesterday and I was tempted to have lunch at Bar-B-Q Plaza, a popular Thai, Japanese-style hot plate barbeque restaurant in this neck of the woods. The mascot of this restaurant is Bar-B-Gon, a dragon-like character and it is obviously a word play of barbeque and dragon.

As I was reading the news this morning about the upcoming Windows Server 8 launch, I found out that ever popular, often ridiculed NTFS (NT File System) of Windows will be going away. It will be replaced by Protogon, a codename for the new file system that Microsoft is about to release. Protogon? A word play of prototype and dragon?

The new file system, with backward compatibility with NTFS, will be called ReFS or Resilient File System. And the design objectives of what Microsoft calls “next generation” file system are clear and adept to the present day requirements. I notably mentioned present day requirements for a reason, because when I went through the key features of ReFS, the concepts and the ideas are not exactly “next generation“. Many of these features are already present with most storage vendors we know of, but perhaps for the people in the Windows world, these features might sound “next generation” to them.

ReFS, to me, is about time. NTFS has been around for a long, long time. It was first known in the wild in the 1993, and gain prominence and wide acceptance in Windows 2000 as the “enterprise-ready” file system. Indeed it was, because that was the time Microsoft Windows started its dominance into the data centers when the Unix vendors were still bickering about their version of open standards. Active Directory (AD) and NTFS were the 2 key technologies that slowly, but surely, removed Unix’s strengths in the data centers.

But over the years, as the storage networking technologies like SAN and NAS were developing and maturing, I see the NTFS being little developed to meet the strengths of these storage networking technologies and relevant protocols in the data world. When I did  a little bit of system administration on Windows (2000, 2003 notably), I could feel that NTFS was developed with direct-attached storage (DAS) or internal disks in mind. Definitely not full taking advantage of the strengths of Fibre Channel or iSCSI SAN. It was only in Windows Server 2008, that I felt Microsoft finally had enough pussyfooting with SAN and NAS, and introduced a more decent disk storage management that incorporates features that works well natively with SAN. Now, Microsoft can no longer sit quietly without acknowledging the need to build enterprise-ready technologies related to storage networking and data management. And the core in the new Microsoft Windows Server 8 engine for that is the ReFS.

One of the key technology objectives in the design of ReFS is backward compatibility. Windows has a huge market to address and they cannot just shove NTFS away. The way they did was to maintain the upper level API and file semantics and having a new core file system engine as shown in the diagram below:

ReFS is positioned with resiliency in mind. Here are a few resilient features:

  • Ability to isolate fault and perform data salvation on parts of the file system without taking the entire file system or volume offline. The goal of REFS here is to be ONLINE and serving data all the time!
  • Checksumming data and metadata for integrity. It verifies all data, and in some cases, auto-correcting corrupted data
  • Optional integrity streams that ensures protection for all forms of file-level data corruption. When enabled, whenever a file is changed, the modified copy is written to a different area of the disk than that of the original file. This way, even if the write operation is interrupted and the modified file is lost, the original file is still intact. (Doesn’t this sounds like COW with snapshots?) When combined with Storage Spaces (we will talk about this later), which can store a copy of all files in a storage array on more than one physical disk, ReFS gives Windows a way to automatically find and open an uncorrupted version of a file In the event that a file on one of the physical disks becomes corrupted. Microsoft does not recommend integrity streams for applications or systems with a specific type of storage layout or applications which want better control in the disk storage, for example databases.
  • Data scrubbing for latent disk errors. There is an tool, integrity.exe which runs and manages the data scrubbing and integrity policies. The file attribute, FILE_ATTRIBUTE_NO_SCRUB_DATA, will allow certain applications to skip this options and have these applications control integrity policies beyond what ReFS has to offer.
  • Shared storage pools across machines for additional fault tolerance and load balancing (ala Oracle RAC perhaps?)
  • Protection against bit rot. Silent data corruption, which I have blogged about many, many moons ago.

End-to-end resilient architecture is the goal in mind.

From a file structure standpoint, here’s how ReFS looks like:

ReFS is Copy-on-Write (COW). As you know, I am a big fan of any file systems but COW is one that I am most familiar with. NetApp’s Data ONTAP, Oracle Solaris, ZFS and the upcoming Linux BTRFS are all implementations of COW. Similar to BTRFS, ReFS uses a B+ tree implementation and as described in Wikipedia,

ReFS uses B+ trees for all on-disk structures including metadata and file data. The file size, total volume size, number of files in a directory and number of directories in a volume are limited by 64-bit numbers, which translates to maximum file size of 16 Exbibytes, maximum volume size of 1 Yobibyte (with 64 KB clusters), which allows large scalability with no practical limits on file and directory size (hardware restrictions still apply). Metadata and file data are organized into tables similar to relational database. Free space is counted by a hierarchal allocator which includes three separate tables for large, medium, and small chunks. File names and file paths are each limited to a 32 KB Unicode text string.

In ReFS, Microsoft introduces Storage Spaces. And the concept is very, very similar to what ZFS is, with the seamless implementation of a volume manager, RAID management, and highly resilient file system. And ZFS is 10 years old. So much for ReFS being “next generation“.  But here is a series of screenshots of how Storage Spaces looks like:

And similar to this “flexible volume management” ala ONTAP FlexVol and ZFS file systems, you can add disk drives on the fly, and grow your volumes online and real time.

ReFS inherits many of the NTFS features as it inches towards the Windows Server 8 launch date. Some of the features mentioned were the BitLocker encryption, Access Control List (ACL) for security (naturally), Symbolic Links, Volume Snapshots, File IDs and Opportunistic Locking (Oplocks).

ReFS is intended to scale to as what Microsoft says, “to extreme limits“. Here is a table describing those limits:

ReFS new technology will certainly bring Windows to the stringent availability and performance requirements of modern day file systems, but the storage networking world is also evolving into the cloud computing space. Object-based file systems are also getting involved as market trends dictate new requirements and file systems, in order to survive, must continue to evolve.

Microsoft’s file system, NTFS took a long time to come to this present version, ReFS, but can Microsoft continue to innovate to change the rules of the data storage game? We shall see …

Primary Dedupe where are you?

I am a bit surprised that primary storage deduplication has not taken off in a big way, unlike the times when the buzz of deduplication first came into being about 4 years ago.

When the first deduplication solutions first came out, it was particularly aimed at the backup data space. It is now more popularly known as secondary data deduplication, the technology has reduced the inefficiencies of backup and helped sparked the frenzy of adulation of companies like Data Domain, Exagrid, Sepaton and Quantum a few years ago. The software vendors were not left out either. Symantec, Commvault, and everyone else in town had data deduplication for backup and archiving.

It was no surprise that EMC battled NetApp and finally won the rights to acquire Data Domain for USD$2.4 billion in 2009. Today, in my opinion, the landscape of secondary data deduplication has pretty much settled and matured. Practically everyone has some sort of secondary data deduplication technology or solution in place.

But then the talk of primary data deduplication hardly cause a ripple when compared a few years ago, especially here in Malaysia. Yeah, the IT crowd is pretty fickle that way because most tend to follow the trend of the moment. Last year was Cloud Computing and now the big buzz word is Big Data.

We are here to look at technologies to solve problems, folks, and primary data deduplication technology solutions should be considered in any IT planning. And it is our job as storage networking professionals to continue to advise customers about what is relevant to their business and addressing their pain points.

I get a bit cheesed off that companies like EMC, or HDS continue to spend their marketing dollars on hyping the trends of the moment rather than using some of their funds to promote good technologies such as primary data deduplication that solve real life problems. The same goes for most IT magazines, publications and other communications mediums, rarely giving space to technologies that solves problems on the ground, and just harping on hypes, fuzz and buzz. It gets a bit too ordinary (and mundane) when they are trying too hard to be extraordinary because everyone is basically talking about the same freaking thing at the same time, over and over again. (Hmmm … I think I am speaking off topic now .. I better shut up!)

We are facing an avalanche of data. The other day, the CEO of Nexenta used the word “data tsunami” but whatever terms used do not matter. There is too much data. Secondary data deduplication solved one part of the problem and now it’s time to talk about the other part, which is data in primary storage, hence primary data deduplication.

What is out there?  Who’s doing what in term of primary data deduplication?

NetApp has their A-SIS (now NetApp Dedupe) for years and they are good in my books. They talk to customers about the benefits of deduplication on their FAS filers. (Side note: I am seeing more benefits of using data compression in primary storage but I am not going to there in this entry). EMC has primary data deduplication in their Celerra years ago but they hardly talk much about it. It’s on their VNX as well but again, nobody in EMC ever speak about their primary deduplication feature.

I have always loved Ocarina Networks ECO technology and Dell don’t give much hoot about Ocarina since the acquisition in  2010. The technology surfaced a few months ago in Dell DX6000G Storage Compression Node for its Object Storage Platform, but then again, all Dell talks about is their Fluid Data Architecture from the Compellent division. Hey Dell, you guys are so one-dimensional! Ocarina is a wonderful gem in their jewel case, and yet all their storage guys talk about are Compellent  and EqualLogic.

Moving on … I ought to knock Oracle on the head too. ZFS has great data deduplication technology that is meant for primary data and a couple of years back, Greenbytes took that and made a solution out of it. I don’t follow what Greenbytes is doing nowadays but I do hope that the big wave of primary data deduplication will rise for companies such as Greenbytes to take off in a big way. No thanks to Oracle for ignoring another gem in ZFS and wasting their resources on pre-sales (in Malaysia) and partners (in Malaysia) that hardly know much about the immense power of ZFS.

But an unexpected source coming from Microsoft could help trigger greater interest in primary data deduplication. I have just read that the next version of Windows Server OS will have primary data deduplication integrated into NTFS. The feature will be available in Windows 8 and the architectural view is shown below:

The primary data deduplication in NTFS will be a feature add-on for Windows Server users. It is implemented as a filter driver on a per volume basis, with each volume a complete, self describing unit. It is cluster aware, and fully crash consistent on all operations.

The technology is Microsoft’s own technology, built from scratch and will be working to position Hyper-V as an strong enterprise choice in its battle for the server virtualization space with VMware. Mind you, VMware already has a big, big lead and this is just something that Microsoft must do-or-die to keep Hyper-V playing catch-up. Otherwise, the gap between Microsoft and VMware in the server virtualization space will be even greater.

I don’t have the full details of this but I read that the NTFS primary deduplication chunk sizes will be between 32KB to 128KB and it will be post-processing.

With Microsoft introducing their technology soon, I hope primary data deduplication will get some deserving accolades because I think most companies are really not doing justice to the great technologies that they have in their jewel cases. And I hope Microsoft, with all its marketing savviness and adeptness, will do some justice to a technology that solves real life’s data problems.

I bid you good luck – Primary Data Deduplication! You deserved better.