The news of EMC’s would be acquisition a few weeks ago was an open secret and rumour has it that NetApp was eyeing XtremIO as well. Looks like EMC has beaten NetApp to it yet again.
The interesting part was of course, the price. USD$430 million is a very high price to pay for a stealthy, 2-year old company which has 2 rounds of funding totaling USD$25 million. Why such a large amount?
XtremIO has a talented team of engineers; the notable ones being Yaron Segev and Shahar Frank. They have their background in InfiniBand, and Shahar Frank was the chief architect of Exanet scale-out NAS (which was acquired by Dell). However, as quoted by 451Group, XtremeIO is building an all-flash SAN array that “provides consistently high performance, high levels of flash endurance, and advanced functionality around thin provisioning, de-dupe and space-efficient snapshots“.
Furthermore, XtremeIO has developed a real-time inline deduplication engine that does not degrade performance. It does this by spreading the write I/Os over the entire array. There is little information about this deduplication engine, but I bet XtremIO has developed a real-time, inherent deduplication file system that spreads all the I/Os to balance the wear-leveling as well as having scaling performance. I bet XtremIO will dedupe everything that it stores, has a B+ tree, copy-on-write file system with a super-duper efficient hashing algorithm for address mapping (pointers) with this deduplication file system. Ok, ok, I am getting carried away here, because it is likely that I will be wrong, but I can imagine, can’t I?
I am reading the XtremeIO whitepaper “Flash Implications in Enterprise Storage Array Designs”. I am writing this as I read, therefore, comments that I have written before this paragraph (especially the deduplication file system in the previous paragraph) has no bearing of what I am writing from now on.
The whitepaper doesn’t give away what the XtremIO technology is. In fact, there’s nothing about XtremeIO’s super secretive secret sauce. The whitepaper paints scenarios of what are the common features (and problems) of enterprise storage systems, and the issues that confront us as storage professionals, as we explain our storage technology to potential customers.
I like that whitepaper a lot, because it exposes our conscience of speaking out of what are the pros and cons; the rights and the wrongs.
When NetApp tells you how great primary deduplication is with capacity savings, did they tell you about the performance hit during the deduplication post-processing? When EMC said that their Copy-on-First-Access (COFA) snapshots is great, did they mention how they had to do 2 x Write I/Os in order to complete the snapshot creation? When NetApp touts their Copy-on-Write snapshots (btw, NetApp’s snapshots should be Redirect-On-Write – read here), did they explain that their snapshots would result in subsequent fragmented I/Os for reads?
The conclusive paragraph in the whitepaper speaks of how these present enterprise storage designs impacts the need for performance. Most of them don’t, preferring to go for storage efficiency and capacity savings over performance. The development of XtremIO wants to change that and they have listed the following that would be in their solution (copied in verbatim from the whitepaper):
- It must be a scale-out design. Flash simply delivers performance levels beyond the capabilities of any scale-up (i.e. dual controller) architecture.
- The system must automatically balance itself. If balancing does not take place then adding capacity will not increase performance.
- Attempting to sequentialize workloads no longer makes sense since flash media easily handles random I/O. Rather, the unique ability of flash to perform random I/O should be exploited to provide new capabilities.
- Cache-centric architectures should be rethought since today’s data patterns are increasingly random and inherently do not cache well. Likewise, tiering is largely ineffective for any active data set.
- Any storage feature that performs multiple writes to function must be completely rethought for two reasons.
- First, the extra writes steal available I/O operations from serving hosts. And second, with flash’s finite write cycles, extra writes must be avoided to extend the usable life of the array.
- Array features that have been implemented over time are typically functionally separate. For example, in most arrays snapshots have existed for a long time and deduplication (if available) is fairly new. The two features do not overlap or leverage each other in any way. But there are significant benefits to be realized through having a unified metadata model for deduplication, snapshots, thin provisioning, replication, and other advanced array features.
I have underlined the key arguments above. If we take our time to understand the existing storage technology out there, be it NetApp FAS, EMC CX and VNX, Dell Compellent, HDS and compare it with the points above, a lot of features that these guys offer will be blown out of the picture.
There will be plenty of arguments about whose technology is better and it is a no-brainer that many of the traditional enterprise storage vendors are throwing Flash SSDs to solve the I/O performance issue now. The problem is, they probably had dug a very deep hole going after storage efficiency (thin provisioning, snapshots, deduplication, RAID, cloning and many others) in the past and they did not give priority to I/O performance. Now that SSDs is tilting the scale towards I/O performance, these guys will have a tough time un-doing all that they have done.
This becomes a perfect excuse to buy smaller all-Flash start-ups. EMC has acquired XtremIO. Violin Memory, Virident, Kaminario, SolidFire, PureStorage and NexGen are all bridesmaids waiting to be swooped off their feet by a larger suitor. And it is the one with the most financial muscle that is most likely to win.
Let the bridesmaids bidding wars begin!