The engineering of Elastifile

[Preamble: I was a delegate of Storage Field Day 12. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented in this event]

When it comes to large scale storage capacity requirements with distributed cloud and on-premise capability, object storage is all the rage. Amazon Web Services started the object-based S3 storage service more than a decade ago, and the romance with object storage started.

Today, there are hundreds of object-based storage vendors out there, touting features after features of invincibility. But after researching and reading through many design and architecture papers, I found that many object-based storage technology vendors began to sound the same.

At the back of my mind, object storage is not easy when it comes to most applications integration. Yes, there is a new breed of cloud-based applications with RESTful CRUD API operations to access object storage, but most applications still rely on file systems to access storage for capacity, performance and protection.

These CRUD and CRUD-like APIs are the common semantics of interfacing object storage platforms. But many, many real-world applications do not have the object semantics to interface with storage. They are mostly designed to interface and interact with file systems, and secretly, I believe many application developers and users want a file system interface to storage. It does not matter if the storage is on-premise or in the cloud.

Let’s not kid ourselves. We are most natural when we work with files and folders.

Implementing object storage also denies us the ability to optimally utilize Flash and solid state storage on-premise when the compute is in the cloud. Similarly, when the compute is on-premise and the flash-based object storage is in the cloud, you get a mismatch of performance and availability requirements as well. In the end, there has to be a compromise.

Another “feature” of object storage is its poor ability to handle transactional data. Most of the object storage do not allow modification of data once the object has been created. Putting a NAS front (aka a NAS gateway) does not take away the fact that it is still object-based storage at the very core of the infrastructure, regardless if it is on-premise or in the cloud.

Resiliency, latency and scalability are the greatest challenges when we want to build a true globally distributed storage or data services platform. Object storage can be resilient and it can scale, but it has to compromise performance and latency to be so. And managing object storage will not be as natural as to managing a file system with folders and files.

Enter Elastifile.

Let’s look at the 3 requirements of a highly distributed global storage and data services platform – Resiliency, Latency and Scalability.

Elastifile impressed me because they approached their design from a globally distributed FILE SYSTEM perspective. From the start, Elastifile has solved the problem that most real-world applications want – folders and files. Most object storage vendors will never have a high performance file system because they were never designed as a file system. Yes, if there is a need to have an object interface, Elastifile has that too.

Designing a distributed system is complex. Variables of CAP Theorem (Consistency, Availability, Partition Tolerance) apply. Distributed object storage usually are designed to ensure Availability and Partition Tolerance variables are intact, and having Consistency to eventually catch up. However, in a distributed file system, Consistency must be preserved because writes and updates must be committed for data integrity. But Consistency will have a drag on performance, impacting latency. And in such a distributed architecture, many participating nodes in the distributed cluster could fail, impacting resiliency.

The experts at Elastifile overcome this challenge by designing a new consensus algorithm to lay the resilient foundation of their globally distributed, highly resilient, low latency storage file system. I was introduced to Bizur, a new key-value consistent algorithm designed by Ezra Hoch, the Chief Architect of Elastifile, and his team. The Bizur algorithm was able to deliver a highly scalable distributed file system with minimal latency impact should nodes in the global cluster fail.

With the all-Flash architecture and intelligent metadata caching, I saw their demo of 100 nodes, 10,000 instances delivering 888,000 IOPS with a 1.3ms latency in Google Cloud. I was totally blown away.

I got a screenshot of Elastifile’s implementation below:

I left the Elastifile presentation of Storage Field Day 12 impressed to the max. The deep Computer Science thinking that went into the Bizur algorithm and the design of the Elastifile file system was truly remarkable in addressing a universal requirement of data services.

In summary,

  • Elastifile simplified most applications integration through well-known NAS protocols and POSIX file semantics and it supports RESTful APIs for applications that want object storage integration.
  • Elastifile is all-Flash and supports high performance, low latency transactional storage requirements of applications regardless of on-premise or in the cloud
  • Elastifile overcomes distributed file system performance with low latency through smart design
  • Elastifile is resilient and performance impact is minimized with the new Bizur algorithm implementation
  • Elastifile can grow on a web-scale magnitude and is agnostic hardware and premise

In my point of view, Elastifile is a true engineering and Computer Science company, ready to take on this new world in need of a true globally distributed data services platform.

(Postscript: I was so enthusiastic when Elastifile was put up as a presenter in Storage Field Day 12. I wanted to very much meet Shahar Frank, Elastifile’s CTO. Shahar was one of the founders of XtremIO and EMC acquired XtremIO in 2012. He and I exchanged emails several weeks prior to the EMC acquisition and this week, almost 5 years later, I was extremely honoured to meet him.)

 

About cfheoh

I am a technology blogger with 20+ years of IT experience. I write heavily on technologies related to storage networking and data management because that is my area of interest and expertise. I introduce technologies with the objectives to get readers to *know the facts*, and use that knowledge to cut through the marketing hypes, FUD (fear, uncertainty and doubt) and other fancy stuff. Only then, there will be progress.

I am involved in SNIA (Storage Networking Industry Association) and as of October 2013, I have been appointed as SNIA South Asia & SNIA Malaysia non-voting representation to SNIA Technical Council. I was previously the Chairman of SNIA Malaysia until Dec 2012.

As of August 2015, I am returning to NetApp to be the Country Manager of Malaysia & Brunei. Given my present position, I am not obligated to write about my employer and its technology, but I am indeed subjected to Social Media Guidelines of the company. Therefore, I would like to make a disclaimer that what I write is my personal opinion, and mine alone. Therefore, I am responsible for what I say and write and this statement indemnify my employer from any damages.

Tagged , , , , , , , , . Bookmark the permalink.

2 Responses to The engineering of Elastifile

  1. Pingback: The engineering of Elastifile - Tech Field Day

  2. Pingback: There’s a new cluster filesystem on the block, Elastifile | RayOnStorage Blog

Leave a Reply

Your email address will not be published. Required fields are marked *