VAST Data must be something special

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

Vast Data coming out bash!

The delegates of Storage Field Days were always the lucky bunch. We have witnessed several storage technology companies coming out of stealth at these Tech Field Days. The recent ones in memory for me were Excelero and Hammerspace. But to have one where the venerable storage doyen, Mr. Howard Marks, Vast Data new tech evangelist, to introduce the deep dive of Vast Data technology was something special.

For those who knew Howard, he is fiercely independent, very storage technology smart, opinionated and not easily impressed. As a storage technology connoisseur myself, I believe Howard must have seen something special in Vast Data. They must be doing something extremely unique and impressive that someone like Howard could not resist, and made him jump to the vendor side. This sets the tone of my blog.

The Vast Data architecture (not so deep dive)

From high level, Vast Data is separated into Compute Nodes and the Databoxes. The Compute Nodes host the Vast Universal File System (UFS) in containers but are stateless. The metadata is housed in the Intel Optane 3D-Xpoint medium in the Databoxes, resulting in a loosely coupled cluster, with single global namespace across tens of thousands of Compute Nodes and thousands of Databoxes. Here is an architecture diagram of how the Vast Data pieces fit.

The UFS datastore is disaggregated, and spread across all responsible Compute Nodes (called Element Stores), to define the scaling framework of the Vast Data technology. In between, the chosen network is NVMe-over-Fabric. ROCEv2 and TCP are probably most viable transport to grow.

I like their Data Reduction

One of their key technologies which has impressed me is their Data Footprint Reduction (DFR) technology. Yes, I still use the word “footprint” (for legacy reasons) in there because I wrote about this concept called Native Format Optimization like 6 years ago. It was, what I believed, the truest form of data reduction technology and getting the best of data. It very much reminded me of Ocarina Networks and the POC (proof-of-concept) my team and I did at Petronas and their SEG-Y and SEG-D data. The reduction was so astounding that Ocarina just blew away NetApp and Data Domain at the time. It was end of 2009.

Unlike most rigid (both fixed and variable blocks) confines of DFR, the Vast Data reduction would dedupe the variable GB chunked data with a unique hashed fingerprint and then cluster “close enough” deduped data chunks. This technique resembled a master raw block with its incremental forever offsprings. And this, of course, was processed at the Intel Optane layer, resulted in fast writes and reads. Because the deduped happened after the writes of the blocks have persisted, I would term this post production dedupe but the medium of the Optane’s SCM (Storage Class Memory) probably made it inline deduplication speed as well. Oh how times have changed!

Here’s a look at Vast Data’s data reduction technology.

The deduplicated blocks, compressed locally at the databoxes for fast access, and could even perform data reduction on other already optimized data. As it was mentioned in this article, Commvault’s already deduped/compressed data could be further reduced with Vast Data’s reduction technology. Vast Data shared the compounded reduction result.

Will Vast Data continue to be the special one?

The days are still too early to tell. NVMeoF (NVMe-over-Fabrics) adoption is still at the doorsteps. NVMe over ROCEv2/iWARP/TCP/Fibre Channel are still being sorted out by invested storage vendors.The embrace of  QLC (Quad Level Cell) is slowly gaining momentum as SCM (Storage Class Memory) because this dominant performance tier, and the new interconnects like CCIX, Gen-Z are also coming into the picture.

Where does Vast Data want to play? The QLCs in their databoxes certainly mean that it will be general data workloads at the webscale level, where massive data lakes are becoming a problem. The data protection features and future replication definitely tell the tale that what Vast Data has engineered today is really for the future. In a year or 2, many webscalers will reach breaking points with their invested infrastructures and data management services.

Commercial HPC workloads are also gaining ground, with Machine Learning/Deep Learning/Artificial Intelligence applications leading the onslaught. I believe data tools and pipeline frameworks should bring greater integration (and I stress on the word “integration”), to new and future workloads.

At the Storage Field Day session, there were discussions addressing the secondary data/data protection market. With the lines blurring between primary and secondary, there may be just one tier of data, and one that will be a massive data lake for both storage and analytics.

I especially liked the loose coupling and stateless compute nodes because modern edge computing demands faster, yet simpler data storage and compute technology. Automation with disagregated and composable frameworks at the edge is certainly a needed feature play. Partnerships with cloud/edge service providers seem to be a logical path to Vast Data’s expansion plans.

I have great vibes with Vast Data, and I think with Mr. Howard Marks as Chief Rah-Rah, the path to become the special one is well lit. These executive team present at Storage Field Day were Renen Hallak (CEO and Co-Founder), Jeff Denworth (VP of Products, and Co-Founder) and Michael Wing (President).

Great start to Vast Data. They must be something special.

Tagged , , , , . Bookmark the permalink.

About cfheoh

I am a technology blogger with 20+ years of IT experience. I write heavily on technologies related to storage networking and data management because that is my area of interest and expertise. I introduce technologies with the objectives to get readers to *know the facts*, and use that knowledge to cut through the marketing hypes, FUD (fear, uncertainty and doubt) and other fancy stuff. Only then, there will be progress. I am involved in SNIA (Storage Networking Industry Association) and as of October 2013, I have been appointed as SNIA South Asia & SNIA Malaysia non-voting representation to SNIA Technical Council. I was previously the Chairman of SNIA Malaysia until Dec 2012. As of August 2015, I am returning to NetApp to be the Country Manager of Malaysia & Brunei. Given my present position, I am not obligated to write about my employer and its technology, but I am indeed subjected to Social Media Guidelines of the company. Therefore, I would like to make a disclaimer that what I write is my personal opinion, and mine alone. Therefore, I am responsible for what I say and write and this statement indemnify my employer from any damages.

One Response to VAST Data must be something special

  1. Pingback: VAST Data — KorP`s blog

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.