Protogon File System

I was out shopping yesterday and I was tempted to have lunch at Bar-B-Q Plaza, a popular Thai, Japanese-style hot plate barbeque restaurant in this neck of the woods. The mascot of this restaurant is Bar-B-Gon, a dragon-like character and it is obviously a word play of barbeque and dragon.

As I was reading the news this morning about the upcoming Windows Server 8 launch, I found out that ever popular, often ridiculed NTFS (NT File System) of Windows will be going away. It will be replaced by Protogon, a codename for the new file system that Microsoft is about to release. Protogon? A word play of prototype and dragon?

The new file system, with backward compatibility with NTFS, will be called ReFS or Resilient File System. And the design objectives of what Microsoft calls “next generation” file system are clear and adept to the present day requirements. I notably mentioned present day requirements for a reason, because when I went through the key features of ReFS, the concepts and the ideas are not exactly “next generation“. Many of these features are already present with most storage vendors we know of, but perhaps for the people in the Windows world, these features might sound “next generation” to them.

ReFS, to me, is about time. NTFS has been around for a long, long time. It was first known in the wild in the 1993, and gain prominence and wide acceptance in Windows 2000 as the “enterprise-ready” file system. Indeed it was, because that was the time Microsoft Windows started its dominance into the data centers when the Unix vendors were still bickering about their version of open standards. Active Directory (AD) and NTFS were the 2 key technologies that slowly, but surely, removed Unix’s strengths in the data centers.

But over the years, as the storage networking technologies like SAN and NAS were developing and maturing, I see the NTFS being little developed to meet the strengths of these storage networking technologies and relevant protocols in the data world. When I did  a little bit of system administration on Windows (2000, 2003 notably), I could feel that NTFS was developed with direct-attached storage (DAS) or internal disks in mind. Definitely not full taking advantage of the strengths of Fibre Channel or iSCSI SAN. It was only in Windows Server 2008, that I felt Microsoft finally had enough pussyfooting with SAN and NAS, and introduced a more decent disk storage management that incorporates features that works well natively with SAN. Now, Microsoft can no longer sit quietly without acknowledging the need to build enterprise-ready technologies related to storage networking and data management. And the core in the new Microsoft Windows Server 8 engine for that is the ReFS.

One of the key technology objectives in the design of ReFS is backward compatibility. Windows has a huge market to address and they cannot just shove NTFS away. The way they did was to maintain the upper level API and file semantics and having a new core file system engine as shown in the diagram below:

ReFS is positioned with resiliency in mind. Here are a few resilient features:

  • Ability to isolate fault and perform data salvation on parts of the file system without taking the entire file system or volume offline. The goal of REFS here is to be ONLINE and serving data all the time!
  • Checksumming data and metadata for integrity. It verifies all data, and in some cases, auto-correcting corrupted data
  • Optional integrity streams that ensures protection for all forms of file-level data corruption. When enabled, whenever a file is changed, the modified copy is written to a different area of the disk than that of the original file. This way, even if the write operation is interrupted and the modified file is lost, the original file is still intact. (Doesn’t this sounds like COW with snapshots?) When combined with Storage Spaces (we will talk about this later), which can store a copy of all files in a storage array on more than one physical disk, ReFS gives Windows a way to automatically find and open an uncorrupted version of a file In the event that a file on one of the physical disks becomes corrupted. Microsoft does not recommend integrity streams for applications or systems with a specific type of storage layout or applications which want better control in the disk storage, for example databases.
  • Data scrubbing for latent disk errors. There is an tool, integrity.exe which runs and manages the data scrubbing and integrity policies. The file attribute, FILE_ATTRIBUTE_NO_SCRUB_DATA, will allow certain applications to skip this options and have these applications control integrity policies beyond what ReFS has to offer.
  • Shared storage pools across machines for additional fault tolerance and load balancing (ala Oracle RAC perhaps?)
  • Protection against bit rot. Silent data corruption, which I have blogged about many, many moons ago.

End-to-end resilient architecture is the goal in mind.

From a file structure standpoint, here’s how ReFS looks like:

ReFS is Copy-on-Write (COW). As you know, I am a big fan of any file systems but COW is one that I am most familiar with. NetApp’s Data ONTAP, Oracle Solaris, ZFS and the upcoming Linux BTRFS are all implementations of COW. Similar to BTRFS, ReFS uses a B+ tree implementation and as described in Wikipedia,

ReFS uses B+ trees for all on-disk structures including metadata and file data. The file size, total volume size, number of files in a directory and number of directories in a volume are limited by 64-bit numbers, which translates to maximum file size of 16 Exbibytes, maximum volume size of 1 Yobibyte (with 64 KB clusters), which allows large scalability with no practical limits on file and directory size (hardware restrictions still apply). Metadata and file data are organized into tables similar to relational database. Free space is counted by a hierarchal allocator which includes three separate tables for large, medium, and small chunks. File names and file paths are each limited to a 32 KB Unicode text string.

In ReFS, Microsoft introduces Storage Spaces. And the concept is very, very similar to what ZFS is, with the seamless implementation of a volume manager, RAID management, and highly resilient file system. And ZFS is 10 years old. So much for ReFS being “next generation“.  But here is a series of screenshots of how Storage Spaces looks like:

And similar to this “flexible volume management” ala ONTAP FlexVol and ZFS file systems, you can add disk drives on the fly, and grow your volumes online and real time.

ReFS inherits many of the NTFS features as it inches towards the Windows Server 8 launch date. Some of the features mentioned were the BitLocker encryption, Access Control List (ACL) for security (naturally), Symbolic Links, Volume Snapshots, File IDs and Opportunistic Locking (Oplocks).

ReFS is intended to scale to as what Microsoft says, “to extreme limits“. Here is a table describing those limits:

ReFS new technology will certainly bring Windows to the stringent availability and performance requirements of modern day file systems, but the storage networking world is also evolving into the cloud computing space. Object-based file systems are also getting involved as market trends dictate new requirements and file systems, in order to survive, must continue to evolve.

Microsoft’s file system, NTFS took a long time to come to this present version, ReFS, but can Microsoft continue to innovate to change the rules of the data storage game? We shall see …

IDC 4Q11 Tracker numbers are in

It was a challenging 2011 but the tremendous growth of data continues to spur the growth of storage. According to IDC in its latest Worldwide Quarterly Disk Storage Systems Tracker, the storage market grew a healthy 7.7% in factory revenues, and the total disk storage capacity shipped was 6,279 petabytes, up 22.4% year-on-year! What Greg Schulz once said was absolutely true. “There is no recession in storage” 

Let’s look at the numbers. Overall, the positions of the storage vendors did not change much, but to me, the more exciting part is the growth quarter over quarter.

EMC and NetApp continue to post double digit growth perennially, with 25.9% and 16.6% respectively. Once again, taking market share from HP and others. HP, with the upheaval that is going on right now throughout the organization, got hit badly with a decline of 3.8%, while IBM held ground of 0.0%. And a data growth of 0.0% when the data growth is at more than 20% is not good, not good at all.

HDS, continuing its momentum with a good story, took a decent 11.6% and a fantastic number from HDS’s perspective. I have been out with my HDS buddies and I can feel their excitement and energy that I have never seen before. And that is a good indicator of the innovation and new technology that is coming out from HDS. They just need to work on their marketing and tell the industry more about what they are doing. Japanese can be so modest.

From the report, 2 things peeved me.

  • IDC reports that NetApp and HP are *tied* at 3rd. This does not make any sense at all! How can they be tied when NetApp has double digit growth, 11.2% market share and a revenue of USD$734 million while HP has negative growth, 10.3% market share and USD$677 million revenue? The logic boggles my mind!
  • They lumped Dell and Oracle into others! And others had a -1.4% growth. I am eager to find out how these 2 companies are doing, especially Dell who has been touting superb growth with their storage story.

Meanwhile, in TOTAL, here’s the table for the Total Worldwide Disk Storage Market share for 4Q11.

Numbers don’t lie. HP and IBM, in both 2nd and 3rd place respectively, are not in good shape. Negative growth in an upward trending market spells more trouble in the long run, and they had better buck up.

In this table, Dell gained and went up to #4 ahead of NetApp and from the look of things, Dell is doing all the right things to make sure that their storage market story is gelling together. Kudos! In fact, NetApp’s position and perception in the last 2-3 quarters have been shaky with Dell and HDS breathing down its neck. There isn’t likely to be significant dent to NetApp by HDS or Dell at this point in time, but having been the “the little engine that could” (that’s what I used to call NetApp) for the last few years, NetApp seems to be losing a bit of the extra “ooomph” that has excited the market in the past.

Lastly, I just want to say that my comments are based on the facts and figures in the tables published by IDC. I remember the last time I commented with the same approach, buddies of mine in the industry disagreed with me, saying that each of them are doing great in the Malaysian or South East Asian market.

Sorry guys, I blog I was see and I welcome you to take me to your sessions to let me know how well you are doing here. I would be glad to write more about it. (Hint, hint).

 

Violin pulling the strings

Violin Memory is in our shores as we speak. There are already confirmed news that an EMC veteran in Singapore has joined them and will be surfacing soon in the South Asia region.

Of all the all-Flash storage systems I have on my platter, Violin Memory seems to be the only one which is ready for IPO this year, after having taking in USD$75 million worth of funding in 2011. That was an impressive number considering the economic climate last year was not so great. But what is so great about Violin Memory that is attracting the big money? Both Juniper Networks and Toshiba America are early investors.

I am continuing my quest to look at all-Flash storage systems, after my blogs on Pure Storage, Kaminario and SolidFire. (Actually, I wanted to write about another all-Flash first because it keeps bugging me with its email .. but I feeling annoyed about that one right now). Violin Memory is here and now.

From a technology standpoint, there are a few key technologies, notably their vRAID and their Violin Switched Memory architecture (vXM), both patent pending. Let’s explore these 2 technologies.

At the core of Violin Memory is the vXM, a proprietary, patent-pending memory switching fabric, which Violin claims to be the first in the industry. The architecture uses high speed, fault tolerant memory controllers and FPGA (field programmable gate arrays) to switch between corresponding, fully redundant elements of VIMMs (Violin Inline Memory Modules). The high level vXM architecture is shown below:

 

VIMMs are the building blocks that are the culmination of memory modules, which can be from different memory types. The example below shows the culmination and aggregation of Toshiba MLC chips, which eventually bore the VIMMs and further consolidation into the full capacity Flash array.

The memory switching fabric of the vXM architecture enables very high speed in data switching and routing, and hence Violin can boast of having “spike-free latency“, something we in this industry desperately need.

Another cool technology that Violin has is their hardware-based vRAID. This is a RAID algorithm that is designed to work with Flash and other solid state storage devices. I am going through the Violin Memory white paper now and the technology is some crazy, complicated sh*t. This is presented in their website about the low latency, vRAID:

 

I don’t want to sound stupid writing about the vRAID now, and I probably need to digest the whitepaper several times in order to understand the technology better. And I will let you know once I have a fair idea of how this works.

More about Violin Memory later. Meanwhile, a little snag came up when a small Texas company, Narada Systems filed a suit of patent infringement against Violin on January 5, 2012. The suit mentioned that the vXM has violated the technology and intellectual property of patent #6,504,786 and #7,236,488 and hence claiming damages from Violin Memory. You can read about the legal suit here.

Whether this legal suit will affect Violin Memory is anybody’s guess but the prospects of Violin Memory going for IPO in just a few short years validates how the industry is looking at solid state storage solutions out there.

I have already mentioned a handful solid state storage players who are I called “all-Flash”, and from the Network Computing sites, blogger Howard Marks revealed 2 more stealth-mode, solid state start-ups in XtremIO and Proximal Data. This validates the industry’s confidence in solid state storage, and in 2012, we are going to see a goldrush in this technology.

The storage industry is dying for a revamp in the performance side, and living the bane of poor spinning disks performance for years, has made the market hungry for IOPS, low latency and throughput. Solid state storage is ripe and I hope this will trigger newer architectures in storage, especially RAID. Well done, Violin Memory!

 

NFS-phobic in Malaysia

I taught the EMC Cloud Infrastructure and Services (CIS) class last week and naturally, a few students came from the VMware space. I asked how they were implementing their storage and everyone said Fibre Channel.

I have spoken to a lot of people about this as well in the past, whether they are using SAN or NAS storage for VMware environments. And almost 99% would say SAN, either FC-SAN or iSCSI-based SAN. Why???

When I ask these people about deploying NFS, the usual reply would be related to performance.

NFS version 3 won the file sharing protocol race during its early days where Unix variants were prevalent, but no thanks to the Balkanization of Unices in the 90s. Furthermore, NFS lost quite a bit of ground between NFSv3 in 1995 and the coming out party of NFSv4.1 just 2 years ago. The in-between years were barren and NFS become quite a bit of a joke with “Need For Speed” or “No F*king Security“. That also could be a contributing factor to the NFS-phobia we see here in Malaysia.

I have experiences with both SAN and NAS and understood the respective protocols of Fibre Channel, iSCSI, NFS and CIFS, and I felt that NFS has been given unfair treatment by people in this country. For the uninformed, NFS is the only NAS protocol supported by VMware. CIFS, the Windows file sharing protocol, is not supported, probably for performance and latency reasons. However, if you catch up with high performance computing (HPC), clustering, or MPP (Massively Parallel Processing) resources, almost always you will read about NFS being involved in delivering very high performance I/O. So, why isn’t NFS proposed with confidence in VMware environments?

I have blogged about this before. And I want to use my blog today to reassert what I believe in and hope that more consideration can be given to NFS when it comes to performance, even for virtualized environments.

NFS performance is competitive when compared to Fibre Channel and in a lot of cases, better than iSCSI. It is just that the perception of poor performance in NFS is stuck in people’s mind and it is hard to change that. However, there are multiple credible sources that stated that NFS is comparable to Fibre Channel. Let me share with you one of the source that compared NFS with other transport protocols:

From the 2 graphs of IOPS and Latency, NFS fares well against other more popular transport protocols in VMware environments. Those NFS performance numbers, are probably not RDMA driven as well. Otherwise RDMA could very well boost the NFS numbers into even higher ground.

What is this RDMA (Remote Direct Memory Access)? RDMA is already making its presence felt quietly, and being used with transports like Infiniband and 10 Gigabit Ethernet. In fact, Oracle Solaris version 11 will use RDMA as the default transmission protocol whenever there is a presence of RDMA-enable NICs in the system. The diagram below shows where RDMA fits in in the network stack.

RDMA eliminates the need for the OS to participate in the delivery of data, and directly depositing the data from the initiator’s memory to the target’s memory. This eliminates traditional networking overheads such as buffers copying and setting up network data structures for the delivery. A little comparison of RDMA with traditional networking is shown below:

I was trying to find out how prevalent NFS was in supporting the fastest supercomputers in the world from the Top500 Supercomputing sites. I did not find details of NFS being used, but what I found was the Top500 supercomputers do not employ Fibre Channel SAN at all!  Most have either proprietary interconnects with some on Infiniband and 10 Gigabit Ethernet. I would presume that NFS would figure in most of them, and I am confident that NFS can be a protocol of choice for high performance environments, and even VMware environments.

The future looks bright for NFSv4. We are beginning to see the word of “parallel NFS (pNFS)” being thrown into conversations around here, and the awareness is there. NFS version 4.2 is just around the corner as well, promising greater enhancement to the protocol.

 

One smart shopper

Dell had just acquired AppAssure earlier this week, adding the new company into its stable of Compellent, EqualLogic, Perot Systems, Scalent, Force10, RNA Networks, Ocarina Networks, and ExaNet (did I miss anyone one?). This is not including the various partnerships Dell has with the likes of CommVault, VMware, Caringo, Citrix, Kaminario etc.

From 10,000 feet, Dell is building a force to be reckoned with. With its PC business waning, Dell is making all the moves to secure the datacenter space from various angles. And I like what I see. Each move is seen as a critical cog, moving Dell forward.

But the question is “Can Dell deliver?” It had just missed out Wall Street’s revenue expectation last week, but the outlook of Dell’s business, especially in storage, is looking bright. I caught this piece in Dell’s earnings call transcript, which said:

"Server and networking revenue increased 6%. Total storage 
declined 13% while Dell-owned IP storage growth accelerated 33% 
to $463 million, led by continued growth in all of our Dell IP 
categories including Compellent, which saw over 60% sequential 
revenue growth."

Those are healthy numbers, but what’s most important is how Dell executes in the next 12-18 months. Dell has done very well with both Compellent and EqualLogic and is slowly bringing out its Exanet and Ocarina Networks technology in new products such as the EqualLogic FS7500 and the DR4000 respectively. Naturally, the scale-out engine from Exanet and the deduplication/compression engine from Ocarina will find these technologies integrated into Dell Compellent line in the months to come. And I am eager to see how the “memory virtualization” technology of RNA Networks fits into Dell’s Fluid Data Architecture.

The technologies from Scalent and AppAssure will push Dell into the forefront of the virtualization space. I have no experience with both products, but by the looks of things, these are solid products which Dell can easily and seamlessly plug in to their portfolio of solutions.

The challenge for Dell is their people in the field. Dell has been pretty much a PC company, and still is. The mindset of a consumer based PC company versus a datacenter-centric, enterprise is very different.

Dell Malaysia has been hiring good people.These are enterprise-minded people. They have been moulded by the fires of the datacenters, and they were hired to give Dell Malaysia the enterprise edge. But the challenge for Dell Malaysia remains, and that is changing the internal PC-minded culture.

Practices such as dropping price (disguised as discounts) at first sign of competition, or giving away high-end storage solutions at almost-free price, to me, are not good strategies. Selling enterprise products with just speeds and feeds and articulating a product’s features and benefits, and lacking the regards for the customer’s requirements and pain points are missing the target all together. This kind of mindset, aiming for a quick sell, is not Dell would want. Yes, we agree that quarterly numbers are important, but pounding the field sales for daily updates and forecasts, will only push for unpleasant endings.

Grapevines aside, I am still impressed with how Dell is getting the right pieces to build its datacenter juggernaut.

Solid?

The next all-Flash product in my review list is SolidFire. Immediately, the niche that SolidFire is trying to carve out is obvious. It’s not for regular commercial customers. It is meant for Cloud Service Providers, because the features and the technology that they have innovated are quite cloud-intended.

Are they solid (pun intended)? Well, if they have managed to secure a Series B funding of USD$25 million (total of USD$37 million overall) from VCs such as NEA and Valhalla, and also angel investors such as Frank Slootman (ex-Data Domain CEO) and Greg Papadopoulus(ex-Sun Microsystems CTO), then obviously there is something more than meets the eye.

The one thing I got while looking up SolidFire is there is probably a lot of technology and innovation behind their  Nodes and their Element OS. They hold their cards very, very close to their chest, and I couldn’t not get much good technology related information from their website or in Google. But here’s a look of how the SolidFire is like:

The SolidFire only has one product model, and that is the 1U SF3010. The SF3010 has 10 x 2.5″ 300GB SSDs giving it a raw total of 3TB per 1U. The minimum configuration is 3 nodes, and it scales to 100 nodes. The reason for starting with 3 nodes is of course, for redundancy. Each SF3010 node has 8GB NVRAM and 72GB RAM and sports 2 x 10GbE ports for iSCSI connectivity, especially when the core engineering talents were from LeftHand Networks. LeftHand Networks product is now HP P4000. There is no Fibre Channel or NAS front end to the applications.

Each node runs 2 x Intel Xeon 2.4GHz 6-core CPUs. The 1U height is important to the cloud provider, as the price of floor space is an important consideration.

Aside from the SF3010 storage nodes, the other important ingredient is their SolidFire Element OS.

Cloud storage needs to be available. The SolidFire Helix Self-Healing data protection is a feature that is capable of handling multiple concurrent failures across all levels of their storage. Data blocks are replicated randomly but intelligently across all storage nodes to ensure that the failure or disruption of access to a particular data block is circumvented with another copy of the data block somewhere else within the cluster. The idea is not new, but effective because solutions such as EMC Centera and IBM XIV employ this idea in their data availability. But still, the ability for self-healing ensures a very highly available storage where data is always available.

To address the efficiency of storage, having 3TB raw in the SF3010 is definitely not sufficient. Therefore, the Element OS always have thin provision, real-time compression and in-line deduplication turned on. These features cannot be turned off and operate at a fine-grained 4K blocks. Also important is the intelligence to reclaim of zeroed blocks, no-reservation,  and no data movement in these innovations. This means that there will be no I/O impact, as claimed by SolidFire.

But the one feature that differentiates SolidFire when targeting storage for Cloud Service Providers is their guaranteed volume level Quality of Service (QOS). This is important and SolidFire has positioned their QOS settings into an advantage. As best practice, Cloud Service Providers should always leverage the QOS functionality to improve their storage utilization

The QOS has:

  • Minimum IOPS – Lower IOPS means lower performance priority (makes good sense)
  • Maximum IOPS
  • Burst IOPS – for those performance spikes moments
  • Maximum and Burst MB/sec

The combination of QOS and storage capacity efficiency gives SolidFire the edge when cloud providers can scale both performance and capacity in a more balanced manner, something that is not so simple with traditional storage vendors that relies on lots of spindles to achieve IOPS performance sacrificing capacity in the process. But then again, with SSDs, the IOPS are plenty (for now). SolidFire does not boast performance numbers of millions of IOPS or having throughput into the tens of Gigabytes like Violin, Virident or Kaminario, but what they want to be recognized as the cloud storage as it should be in a cloud service provider environment.

SolidFire calls this Performance Virtualization. Just as we would get to carve our storage volumes from a capacity pool, SolidFire allows different performance profiles to be carved out from the performance pool. This gives SolidFire the ability to mix storage capacity and storage performance in a seemingly independent manner, customizing the type of storage bundling required of cloud storage.

In fact, SolidFire only claims 50,000 IOPS per storage node (including the IOPS means for replicating data blocks). Together with their native multi-tenancy capability, the 50,000 or so IOPS will align well with many virtualized applications, rather than focusing on a 10x performance improvement on a single applications. Their approach is more about a more balanced and spread-out I/O architecture for cloud service providers and the applications that they service.

Their management is also targeted to the cloud. It has a REST API that integrates easily into OpenStack, Citrix CloudStack and VMware vCloud Director. This seamless and easy integration, is more relevant because the CSPs already have their own management tools. That is why SolidFire API is a REST-ready, integration ready to do just that.

The power of the SolidFire API is probably overlooked by storage professionals trained in the traditional manner. But what SolidFire API has done is to provide the full (I mean FULL) capability of the management and provisioning of the SolidFire storage. Fronting the API with REST means that it is real easy to integrate with existing CSP management interface.

Together with the Storage Nodes and the Element OS, the whole package is aimed towards a more significant storage platform for Cloud Service Providers(CSPs). Storage has always been a tricky component in Cloud Computing (despite what all the storage vendors might claim), but SolidFire touts that their solution focuses on what matters most for CSPs.

CSPs would want to maximize their investment without losing their edge in the cloud offerings to their customers. SolidFire lists their benefits in these 3 areas:

  • Performance
  • Efficiency
  • Management

The edge in cloud storage is definitely solid for SolidFire. Their ability to leverage on their position and steering away from other all-Flash vendors’ battlezone could all make sense, as they aim to gain market share in the Cloud Service Provider space. I only wish they can share more about their technology online.

Fortunately, I found a video by SolidFire’s CEO, Dave Wright which gives a great insight about SolidFire’s technology. Have a look (it’s almost 2 hour long):

[2 hours later]: Phew, I just finished the video above and the technology is solid. Just to summarize,

  • No RAID (which is a Godsend for service providers)
  • Aiming for USD5.00 or less per Gigabyte (a good number!)
  • General availability in Q1 2012

Lots of confidence about the superiority of their technology, as portrayed by their CEO, Dave Wright.

Solid? Yes, Solid!

All aboard Starboard

The Internet was abuzz with the emergence of a new storage player, uncloaking from stealth mode yesterday. Calling themselves Starboard Storage Systems, they plan to shake up the SMB/SME market with a full-featured storage system that competes with established players, but with a much more competitive pricing model.

Let’s face it. A cheaper price sells. That is the first door opener right there.

Their model, the AC72 has a starting price of USD$59,995. The capacity is 24TB, and also includes 3 SSDs for performance caching but I can’t really comment whether the price is a good price. It’s all relative and has different context and perceptions in different geographies. An HDD today is already 3 or 4TB, and a raw capacity of 24TB is pretty easy to fill up.

But Starboard Storage has a trick up its sleeve. An SMB/SME customer usually cannot afford to dedicate the storage systems for just one application. It cannot afford to, because it is too cost prohibitive. They are likely to have a mixed workload, with a combination of applications such as emails (think Exchange), databases (think SQL Server or mySQL), web applications, file server (think Windows Client) and server virtualization (think VMware or Hyper-V). They are likely to have a mix of Windows and Linux. Hence Starboard Storage combines all storage file- and block- protocols, in an “all-inclusive” license for NFS, CIFS, iSCSI and Fibre Channel.

An “all-inclusive” license is smart thinking because the traditional storage players such as EMC, IBM, NetApp and the rest of top 6 storage vendors today have far too prohibitive licenses cocktails. Imagine telling your prospect that your storage model is a unified storage and support all storage protocols, but they have to pay for have the protocol licenses turned on. In these days and times, the unified storage story is getting old and no longer a potent selling advantage.

The other features such as snapshots, thin provisioning should be a shoo-in, along with replication and flexible volumes (I am a NetApp guy ;-)) and aggregates (better known as Dynamic Storage Pools in Starboard Storage terminology) a must-have. I am really glad that Starboard Storage could the first to bundle everything in an “all-inclusive” license. Let’s hope the big 6 follow suit.

The AC72 also sports a SSD Accelerator tier for automatic performance caching, for both Reads and Writes. It is not known (at least I did not research this deep enough) whether the Accelerator Tier is also part of the “all-inclusive” license but if it is, then it is fantastic.

To Starboard Systems, their marketing theme are things related to high speed sailboats. The AC in AC72 stands for America’s Cup and AC72 is the high speed catamaran. Their technology is known as MAST (Mixed Workload, Application-Crafted Storage Tiering) architecture (that’s a mouthful) and the target is exactly that – a mixed workload environment – , just what the SMB/SME ordered.

The AC72 has dual redundant controllers for system availability and scales up to 474TB with the right expansion shelves. Starboard claims better performance in the Evaluator Group, IOmeter lab test, as shown in the graph below:

This allows Starboard to claim that they have the ” best density with leading dollar per GB and dollar per IOPS“.

To me, the window of opportunity is definitely there. For a long time, many storage vendors claim to be SMB/SME-friendly without being really friendly. SMB/SME customers have expanded their requirements and wanting more, but there are too much prohibitions and restrictions to get the full features from the respective storage systems vendors. The generosity of Starboard to include “all” licenses is definite a boon for SMB/SMEs and that means that someone has been listening to customers better than the others.

The appeal is there too, for Starboard Storage Systems, because looking at their specifications, I am sure they are not skimping on enterprise quality.

 

Speaking of enterprise storage systems, competitors in Taiwanese brands here in Malaysia such as Infortrend, QSAN aren’t exactly “cheap”. They might sound “cheap” because the local partners lack the enterprise mentality and frequently position these storage as cheap when comparing with the likes of EMC, HP, IBM and NetApp. The “quality” of these local partners in their ability to understand SMB/SME pain-points, ability to articulate benefits or design and architect the total storage and data management solutions for customers, contributes to the “cheap” mentality of customers.

I am not deriding these brands because they are good storage in their own right. But local partners of such brands tend to cut corners, cut price at the first customer’s hesitation, and worst of all, making outrageous claims, all contributing to the “cheap” mentality. These brands deserve better.

SMB/SME customers too, are at fault. Unlike what I have experienced with customers across the borders, both north and south of Malaysia, customers here take a very lackadaisical approach when they come to procure IT equipment and solutions for their requirements. They lack the culture to find out more and improve. It is always about comparing everything to the cheapest price!

Other brands such as Netgear, D-Link and QNAP are really consumer-grade storage disguising as “enterprise-class” storage for SMB/SME, most relying on lower grade CPUs such as Atom, Celeron and non-enterprise x86 CPUs and have less than 8GB of memory. I am not even sure if the memory sported in these boxes are ECC memory. Let’s not even go to the HDDs of these boxes, because most  SMB/SME customers in Malaysia lack the knowledge to differentiate MTBF hours of consumer and enterprise HDDs.

So, kudos to Starboard Storage Systems to package a superb bundle of enterprise features for an SMB/SME price, at least for the US market anyway. USD$59,995 is still a relatively high price in Malaysia because most SMB/SMEs still look at price as the point of discussion. But I believe Starboard brings a no-brainer, enterprise storage systems if they ever want to consider expansion in Malaysia.

The last bastion – Memory

I have been in this industry for almost 20 years. March 2, 2012 will be my 20th year, to be exact. I have never been in the mainframe era, dabbled a bit in the mini computers era during my university days and managed to ride the wave of client-server, Internet explosion in the beginning WWW days, the dot-com crash, and now Cloud Computing.

In that 20 years, I have seen the networking wars (in which TCP/IP and Cisco prevailed), the OS wars and the Balkanization of Unix (Windows NT came out the winner), the CPU wars (SPARC, PowerPC, in which x86 came out tops) and now data and storage. Yet, the last piece of the IT industry has yet to begun or has it?

In the storage wars, it was pretty much the competition between NAS and SAN and religious groups of storage in the early 2000s but now that I have been in the storage networking industry for a while, every storage vendor are beginning to look pretty much the same for me, albeit some slight differentiating factors once in a while.

In the wars that I described, there is a vendor for the product(s) that are peddled but what about memory? We never question what brand of memory we put in our servers and storage, do we? In the enterprise world, it has got to be ECC, DDR2/3 memory DIMMs and that’s about it. Why????

Even in server virtualization, the RAM and the physical or virtual memory are exactly just that – memory! Sure VMware differentiates them with a cool name called vRAM, but the logical and virtual memory is pretty much confined to what’s inside the physical server.

In clustering, architectures such as SMP and NUMA, do use shared memory. Oracle RAC shares its hosts memory for the purpose of Oracle database scalability and performance. Such aggregated memory architectures in one way or another, serves the purpose of the specific applications’ functionality rather than having the memory shared in a pool for all general applications.

What if some innovative company came along, and decided to do just that? Pool all the physical memory of all servers into a single, cohesive and integrated memory pool and every application of each of the server can use the “extended” memory in an instance, without some sort of clustering software or parallel database. One company has done it using RDMA (Remote Direct Memory Access) and their concept is shown below:

 

I am a big fan of RDMA ever since NetApp came out with DAFS some years ago, and I only know a little bit about RDMA because I didn’t spend a lot of time on it. But I know RDMA’s key strength in networking and when this little company called RNA Networks news came up using RDMA to create a Memory Cloud, it piqued my interest.

RNA innovated with their Memory Virtualization Acceleration (MVX) and this is layered on top of 10Gigabit Ethernet or Infiniband networks with RDMA. Within the MVX, there are 2 components of interest – RNAcache and RNAmessenger. This memory virtualization technology allows hundreds of server nodes to lend their memory into the Memory Cloud, thus creating a very large and very scalable memory pool.

As quoted:

RNA Networks then plunks a messaging engine, an API layer, and a pointer updating algorithm
on top of the global shred memory infrastructure, with the net effect that all nodes in the
cluster see the global shared memory as their own main memory.

The RNA code keeps the memory coherent across the server, giving all the benefits of an SMP
or NUMA server without actually lashing the CPUs on each machine together tightly so they
can run one copy of the operating system.

The performance gains, as claimed by RNA Networks, was enormous. In a test  published, running MVX had a significant performance gain over SSDs, as shown in the results below:

This test was done in 2009/2010, so there were no comparisons with present day server-side PCIe Flash cards such as FusionIO. But even without these newer technologies, the performance gains were quite impressive.

In a previous version of 2.5, the MVX technology introduced 3 key features:

  • Memory Cache
  • Memory Motion
  • Memory Store

The Memory Cache, as the name implied, turned the memory pool into a cache for NAS and file systems that are linked to the server. At the time, the NAS protocol supported was only NFS. The cache stored frequently accessed data sets used by the servers. Each server could have simultaneous access to the data set in the pool and MVX would be handling the contention issues.

The Memory Motion feature gives OSes and physical servers (including hypervisors) access to shared pools of memory that acts as a giant swap device during page out/swap out scenarios.

Lastly, the Memory Store was the most interesting for me. It turned the memory pool into a collection of virtual block device and was similar to the concept of RAMdisks. These RAMdisks extended very fast disks to the server nodes and the OSes, and one server node can mount multiple instances of these virtual RAMdisks. Similarly multiple server nodes can mount a single virtual RAMdisk for shared disk reasons.

The RNA Networks MVX scales hundreds of server nodes and supported architectures such as 32/64 bit x86, PowerPC, SPARC and Itanium. At the time, the MVX was available for Unix and Linux only.

The technology that RNA Networks was doing was a perfect example of how RDMA can be implemented. Before this, memory was just memory but this technology takes the last bastion of IT – the memory – out into the open. As the Cloud Computing matures, memory is going to THE component that defines the next leap forward, which is to make the Cloud work like one giant computer. Extending the memory and incorporating memory both on-premise, on the host side as well as memory in the cloud, into a fast, low latency memory pool would complete the holy grail of Cloud Computing as one giant computer.

RNA Networks was quietly acquired by Dell in July 2011 for an undisclosed sum and got absorbed into Dell Fluid Architecture’s grand scheme of things. One blog, Juku, captured an event from Dell Field Tech Day back in 2011, and it posted:

The leitmotiv here is "Fluid Data". This tagline, that originally was used by Compellent
(the term was coined by one of the earlier Italian Compellent customer), has been adopted
for all the storage lineup, bringing the fluid concept to the whole Dell storage ecosystem,
by integrating all the acquired tech in a single common platform: Ocarina will be the
dedupe engine, Exanet will be the scale-out NAS engine, RNA networks will provide an interesting
cache coherency technology to the mix while both Equallogic and Compellent have a different
targeted automatic tiering solution for traditional block storage.

Dell is definitely quietly building something and this could go on for some years. But for the author to quote – “Ocarina will be the dedupe engine, Exanet will be the scale-out NAS engine; RNA Networks will provide cache coherency technology … ” mean that Dell is determined to out-innovate some of the storage players out there.

How does it all play in Dell’s Fluid Architecture? Here’s a peek:

It will be interesting how to see how RNA Networks technology gels the Dell storage technologies together but one thing’s for sure. Memory will be the last bastion that will cement Cloud Computing into an IT foundation of the next generation.

Kaminario who?

The name “Kaminario” intrigues me and I don’t know the meaning of it. But there is a nice roll off the tongue until you say it a few times, fast and your tongue get twisted in a jiffy.

Kaminario is one of the few prominent startups in the all-flash storage space, getting USD$15 million Series C funding from big gun VCs of Sequoia and Globespan Capital Partners in 2011. That brought their total to USD$34 million, and also bringing them the attention of storage market.

I am beginning my research into their technology and their product line, the K2 and see why are they special. I am looking for an angle that differentiates them and how they position themselves in the market and why they deserved Series C funding.

Kaminario was founded in 2008, with their headquarters in Boston Massachusetts. They have a strong R&D facility in Israel and looking at their management lineup, they are headed by several personalities with an Israel background.

All this shouldn’t be a problem to many except the fact that Malaysia don’t recognize Israel diplomatically and some companies here, especially the government, might have an issue with the Israeli link. But then again, we have a lot of hypocrites in Malaysian politics and I am not going to there in my blog. It’s a waste of my time.

The key technology is Kaminario’s K2 SPEAR Architecture and it defines a fundamental method to store and retrieve performance-sensitive data. Yes, since this is an all-Flash storage solution, performance numbers, speeds and feeds are the “weapons” to influence prospects with high performance requirements. Kaminario touts their storage solution scales up to 1.5 million IOPS and 16GB/sec throughput and indeed they are fantastic numbers when you compare them with the conventional HDDs based storage platforms. But nowadays, if you are in the all-Flash game, everyone else is touting similar performance numbers as well. So, it is no biggie.

The secret sauce to the Kaminario technology is of course, its architecture – SPEAR. SPEAR stands for Scale-out Performance Storage Architecture. While Kaminario states that their hardware is pretty much off-the-shelf, open industry standard, somehow under the covers, the SPEAR architecture could have incorporate some special, proprietary design in its hardware to maximize the SPEAR technology. Hence, I believe there is a reason why Kaminario chose a blade-based system in the enclosures of its rack. Here’s a look at their hardware offering:

The idea using blades is a good idea because blades offers integrated wiring, consolidation, simple plug-and-play, ease-of-support, N+1 availability and so on. But this will also can put Kaminario in a position of all-blades or nothing. This is something some customers in Malaysia might have to get used to because many would prefer their racks. I could be wrong and let’s hope I am.

Each enclosure houses 16 blades, with N+1 availability. As I am going through Kaminario’s architecture, the word availability is becoming louder, and this could be something Kaminario is differentiating from the rest. Yes, Kaminario has the performance numbers, but Kaminario is also has a high-available (are we talking 6 nines?) architecture inherent within SPEAR. Of course, I have not done enough to compare Kaminario with the rest yet, but right now, availability isn’t something that most all-Flash startups trumpet loudly. I could be wrong but the message will become clearer when I go through my list of all-Flash – SolidFire, PureStorage, Virident, Violin Memory and Texas Memory Systems.

Each of the blades can be either an ioDirector or a DataNode, and they are interconnected internally with 1/10 Gigabit ports, with at least one blade acting as a standby blade to the rest in a logical group of production blades. The 10Gigabit connection are used for “data passing” between the blades for purpose of load-balancing as well as spreading out the availability function for the data. The Gigabit connection is used for management reasons.

In addition to that there is also a Fibre Channel piece that is fronting the K2 to the hosts in the SAN. Yes, this is an FC-SAN storage solution but since there was no mention of iSCSI, the IP-SAN capability is likely not there (yet).

 Here’s a look at the Kaminario SPEAR architecture:

The 2 key components are the ioDirector and the DataNode. A blade can either have a dedicated personality (either ioDirector or DataNode) or it can share both personalities in one blade. Minimum configuration is 2-blades of 2 ioDirectors for redundancy reasons.

The ioDirector is the front-facing piece. It presents to the SAN the K2 block-based LUNs and has the intelligence to dynamically load balance both Reads and Writes and also optimizing its resource utilization. The DataNode plays the role of fetching, storing, and backup and is pretty much the back-end worker.

With this description, there are 2 layers in the SPEAR architecture. And interestingly, while I mentioned that Kaminario is an all-Flash storage player, it actually has HDDs as well. The HDDs do not participate in the primary data serving and serve as containers for backup for the primary data in the SSDs, which can be MLC-Flash or DRAMs. The back-end backup layer comprising of HDDs is what I said earlier about availability. Kaminario is adding data availability as part of its differentiating features.

That’s the hardware layout of SPEAR, but the more important piece is its software, the SPEAR OS. It has 3 patent-pending  capabilities, with not so cool names (which are trademarked).

  1. Automated Data Distribution
  2. Intelligent Parallel I/O Processing
  3. Self Healing Data Availability

The Automated Data Distribution of the SPEAR OS acts as a balancer. It balances the data by dynamically and randomly (in an random equilibrium fashion, I think) to spread out the data over the storage capacity for efficiency, SSD longevity and of course, optimized performance balancing.

The second capability is Intelligent Parallel I/O Processing. The K2 architecture is essentially a storage grid. The internal 10Gigabit interconnects basically ties all nodes (ioDirectors and DataNodes) together in a grid-like fashion for the best possible intra-node communications. The parallelization of the I/O Read and Write requests spreads across the nodes in the storage grid, giving the best average response and service times.

Last but not least is the Self Healing Data Availability, a capability to dynamically reconfigure accessibility to the data in the event of node failure(s). Kaminario claims no single point of failure, which is something I am very interested to know if given a chance to assess the storage a bit deeper. So far, that’s the information I am able to get to.

The Kaminario K2 product line comes in 3 model – D, F, and H.

D is for DRAM only and F is for Flash MLC only. The H model is a combination of both Flash and DRAM SSDs. Here how Kaminario addresses each of the 3 models:

 

Kaminario is one of the early all-Flash storage systems that has gained recognition in 2011. They have been named a finalist in both Storage Magazine and SearchStorage Storage Product of the Year competitions for 2011. This not only endorses a brand new market for solid state storage systems but validates an entirely new category in the storage networking arena.

Kaminario can be one to watch in 2012 as with others that I plan to review in the coming weeks. The battle for Flash racks is coming!

BTW, Dell is a reseller of Kaminario.

Battle of flash racks coming soon

The battle is probably already here. It has just begun for rack mounted flash-based or DRAM-based (or both) storage systems.

We have read in the news about the launch of EMC’s Project Lightning, and I wrote about it. EMC is already stirring up the competition, aiming its guns at FusionIO. Here’s a slide from EMC comparing their VFCache with FusionIO.

Not to be outdone, NetApp set its motion to douse the razzmatazz of EMC’s Lightning, announcing the future availability of their server-side flash software (no PCIe card) but it will work with major host-based/server-side PCIe Flash cards. (FusionIO, heads up). Ah, in Sun Tsu Art of War, this is called helping your buddy fight the bigger enemy.

NetApp threw some FUDs into the battle zone, claiming that EMC VFCache only supports 300GB while the NetApp flash software will support 2TB, NetApp multiprotocol, and VMware’s VMotion, DRS and HA. (something that VFCache does not support now).

The battle of PCIe has begun.

The next battle will be for the rackmounted flash storage systems or appliance. EMC is following it up with Project Thunder (because thunder comes after lightning), which is a flash-based storage system or appliance. Here’s a look at EMC’s preliminary information on Project Thunder.

And here’s how EMC is positioning different storage tiers in the following diagram below (courtesy of VirtualGeek), being glued together by EMC FAST (Fully Automated Storage Tiering) technology.

But EMC is not alone, as there are already several prominent start-ups out there, already offering flash-based, rackmount storage systems.

In the battle ring, there is Kaminario K2 with the SPEAR (Scale-out Performance Storage Architecture), Violin Memory with Violin Switched Memory (VXM) architecture, Purestorage Purity Operating Environment and SolidFire’s Element OS, just to name a few. Of course, we should never discount the grand daddy of all flash-based storage – Texas Memory Systems RamSAN.

The whole motion of competition in this new arena is starting all over again and it’s exciting for me. There is so much to learn about newer, more innovative architecture and I intend to share more of these players in the coming blog entries. It is time to take notice because the SSDs are dropping in price, FAST! And in 2012, I strongly believe that this is the next battle of the storage players, both established and start-ups.

Let the battle begin!