I have been earnestly observing the growth of Computational Storage for a number of years now. It was known by several previous names, with the name “in-situ data processing” stuck with me the most. The Computational Storage nomenclature became more cohesive when SNIA® put together the CMSI (Compute Memory Storage Initiative) some time back. This initiative is where several standards bodies, the major technology players and several SIGs (special interest groups) in SNIA® collaborated to advance Computational Storage segment in the storage technology industry we know of today.
The use cases for Computational Storage are burgeoning, and the functional implementations of Computational Storage are becoming vital to tackle the explosive data tsunami. In 2018 IDC, in its Worldwide Global Datasphere Forecast 2021-2025 report, predicted that the world will have 175 ZB (zettabytes) of data. That number, according to hearsay, has been revised to a heady figure of 250ZB, given the superlative rate data is being originated, spawned and more.
Computational Storage driving factors
If we take the Computer Science definition of in-situ processing, Computational Storage can be distilled as processing data where it resides. In a nutshell, “Bring Compute closer to Storage“. This means that there is a processing unit within the storage subsystem which does not require the host CPU to perform processing. In a very simplistic manner, a RAID card in a storage array can be considered a Computational Storage device because it performs the RAID functions instead of the host CPU. But this new generation of Computational Storage has much more prowess than just the RAID function in a RAID card.
There are many factors in Computational Storage that make a lot sense. Here are a few:
- Voluminous data inundate the centralized architecture of the cloud platforms and the enterprise systems today. Much of the data come from end point devices – mobile devices, sensors, IoT, point-of-sales, video cameras, et.al. Pre-processing the data at the origin data points can help filter the data, reduce the size to be processed centrally, and secure the data before they are ingested into the central data processing systems
- Real-time processing of the data at the moment the data is received gives the opportunity to create the Velocity of Data Analytics. Much of the data do not need to move to a central data processing system for analysis. Often in use cases like autonomous vehicles, fraud detection, recommendation systems, disaster alerts etc require near instantaneous responses. Performing early data analytics at the data origin point has tremendous advantages.
- Moore’s Law is waning. The CPU (central processing unit) is no longer the center of the universe. We are beginning to see CPU offloading technologies to augment the CPU’s duties such as compression, encryption, transcoding and more. SmartNICs, DPUs (data processing units), VPUs (visual processing units), GPUs (graphics processing units), etc have come forth to formulate a new computing paradigm.
- Freeing up central resources with Computational Storage also accelerates the overall distributed data processing in the whole data architecture. The CPU and the adjoining memory subsystem are less required to perform context switching caused by I/O interrupts as in most of the compute/storage architecture today. The total effect relieves the CPU and giving back more CPU cycles to perform higher processing tasks, resulting in faster performance overall.
- The rise of memory interconnects is enabling a more distributed computing fabric of data processing subsystems. The rising CXL (Compute Express Link™) interconnect protocol, especially after the Gen-Z annex, has emerged a force to be reckoned with. This rise of memory interconnects will likely strengthen the testimony of Computational Storage in the fast approaching future.
Computational Storage Deployment Models
SNIA® has proposed several deployment models. These models are the main starting points where Computational Storage begin. The diagram above sums up deployment models and they include:
- Computational Storage Processors (CSPs)
- Computational Storage Drives (CSDs)
- Computational Storage Arrays (CSAs)
Here is a summary slide that shows the current computational storage instances
SNIA®, with the help of many supporting organizations and technology vendors, have been diligently interweaving the Computational Storage ecosystem. I have been receiving emails and invitations to Computational Storage Geek Out, Video Library, Webinars, White paper and more.
ASIC and FPGA in Computational Storage
The presence of ASICs (application specific integrated circuits) and FPGAs (field programmable gate arrays) is strong in Computational Storage, especially FPGAs. This is driven by the flexibility of repurposing the FGPA for many different functions required in Computational Storage. These functions, defined as CSFs (Computational Storage Functions) include:
- Storage – RAID, Erasure Coding
- Data reduction – Compression, Deduplication, Compaction
- Data encryption
- ML/AI (Machine Learning/Artificial Intelligence)
- Transcoding and Transcribing
- Data handling, and monitoring at the ingestion layer (Scan/Parse, Filter/Select, Aggregate/Join)
- Data classification and basic taxonomy to label and tag the data for compliance and data management
- Remote core dump processing
- Decentralized storage mining/farming
- Acceleration – Many FPGAs and some ASICs perform accelerated processing with high bandwidth due to data locality
- Edge and real time computing with in-memory processing – new distributed computing development as seen in the emerging new breed of data processing frameworks.
Turning data into Valuable Data faster
I have pointed out in the past that moving data is expensive. Data mobility not only incurs time and money, but the tail latency effects have other significant implications to the data processing pipeline. This was my Storage Elephant Compute Birds analogy, where it is easier for the birds to fly to where the elephants are than to have the elephants move to where the birds roost. Data are the elephants; Compute are the birds.
Storing data everywhere is easy. A hard disk drive easily reached 20TB, and doubling that capacity in the next 3-4 years. Fast data is easy with solid state drives with NVMe (Non-Volatile Memory express) interfaces. But how much of that data is of importance and value if the syntax and the context of the data is not determined at the location where it was created or collected, and is not processed quickly. In many end points, the speed of insights from the data processed in-situ is the value. The ability to respond from these in-situ insights seconds or even milliseconds is the value. And these are the velocity and the locality advantages that Computational Storage can bring to CPU offloading and Edge Computing.
In a larger scheme of things, we are seeing the birth of a new distributing computing and data processing paradigm as well, thanks to the emergence of Computational Storage.