[Preamble: I was a delegate of Storage Field Day 15 from Mar 7-9, 2018. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]
Cohesity SpanFS impressed me. Their filesystem was designed from ground up to meet the demands of the voluminous cloud-scale data, and yes, the sheer magnitude of data everywhere needs to be managed.
We all know that primary data is always the more important piece of data landscape but there is a growing need to address the secondary data segment as well.
Like a floating iceberg, the piece that is sticking out is the more important primary data but the larger piece beneath the surface of the water, which is the secondary data, is becoming more valuable. Applications such as file shares, archiving, backup, test and development, and analytics and insights are maturing as the foundational data management frameworks and fast becoming the bedrock of businesses.
The ability of businesses to bounce back after a disaster; the relentless testing of large data sets to develop new competitive advantage for businesses; the affirmations and the insights of analyzing data to reduce risks in decision making; all these are the powerful back engine applicability that thrust businesses forward. Even the ability to search for the right information in a sea of data for regulatory and compliance reasons is part of the organization’s data management application.
We as storage infrastructure consultants as well as data management practitioners know well that the secondary data management is a mess. There are data silos across the organization, often exacerbated by disjointed business and operational practices. They are further fractured with offices across different locations – the ROBO (remote office branch office) scenarios.
Another challenge is archived data. We have seen poorly implemented data management practices has caused organizations defaulting in their duties. I recalled that the CEO of Shell, the Oil & Gas major, had to resign for failing to prove how they overstated their oil reserves by 20%. You can read about that news here. It was a clear case of poor data management of archived data, and the lack of a search capability.
Cohesity is disrupting the secondary data management segment with their hyperconverged Data Platform. At its core, is the SpanFS, their cloud-scale, distributed data protection filesystem. What impressed me most is their strict consistency guarantee.
Piqued from their Storage Field Day 15 session, I researched why is SpanFS special. The strict data consistency question has been on my mind for several months now, and this leads to the path of consensus algorithms.
Delivering strict consistency on a cloud-scale is not easy. In distributed systems, CAP theorem often drives vendors to consider the eventual consistency because it is simpler to implement. Cohesity SpanFS is the opposite, because when it comes to data, especially data that affects the business, strict consistency is paramount.
To understand further, knowing Mohit Aron, Cohesity CEO and founder’s background was a big plus. He was the lead of the Google File System and he was previously the CTO of Nutanix.
In traditional scale-up storage systems, data writes are asynchronous. They are often written to a non-volatile memory medium and write acknowledgments are sent back to the hosts or applications quickly. In scale-out storage systems, data writes are acknowledged as fast as possible too, but the data written may not be known to all the nodes in the scale-out cluster at that particular time instance. This negates the ability to immediately list all the writes from any nodes at the expense of speed. A fast acknowledgment translates to better overall performance but not necessarily a complete data consistency overall.
Cohesity SpanFS employs a method to write to all the nodes before a write acknowledgment is sent to the host or application. This may seem strange because the system’s performance in latency will be affected. But SpanFS delivers high performance with low latency through a combination of distributed locking and their implementation of optimized Paxos consensus algorithm in their technology. I believe this is their unique differentiator, making SpanFS a high performance filesystem with strict consistency.
Cohesity SpanFS is a full featured data services platform. Another very strong feature they presented is SnapTree, that underpins the data protection power of their Data Platform. I am not going in depth into SnapTree as it is well documented here.
The company is growing at a phenomenal rate, and to support their hypergrowth, they having been hiring some of the best around the world.
The Storage Field Day delegates were welcomed at Cohesity’s head quarters in San Jose, and was given a good reception with Mohit leading the friendly hospitality. I enjoyed the visit immensely ending with a great lunch. I wish Cohesity with great success and we look forward to greater things as well.