The Starbucks model for Storage-as-a-Service

Starbucks™ is not a coffee shop. It purveys beyond coffee and tea, and food and puts together the yuppie beverages experience. The intention is to get the customers to stay as long as they can, and keep purchasing the Starbucks’ smorgasbord of high margin provisions in volume. Wifi, ambience, status, coffee or tea with your name on it (plenty of jokes and meme there), energetic baristas and servers, fancy coffee roasts and beans et. al. All part of the Starbucks™-as-a-Service pleasurable affair that intends to lock the customer in and have them keep coming back.

The Starbucks experience

Data is heavy and they know it

Unlike compute and network infrastructures, storage infrastructures holds data persistently and permanently. Data has to land on a piece of storage medium. Coupled that with the fact that data is heavy, forever growing and data has gravity, you have a perfect recipe for lock-in. All storage purveyors, whether they are on-premises data center enterprise storage or public cloud storage, and in between, there are many, many methods to keep the data chained to a storage technology or a storage service for a long time. The storage-as-a-service is like tying the cow to the stake and keeps on milking it. This business model is very sticky. This stickiness is also a lock-in mechanism.

Continue reading

DellEMC Project Nautilus Re-imagine Storage for Streams

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at this event. The content of this blog is of my own opinions and views ]

Cloud computing will have challenges processing data at the outer reach of its tentacles. Edge Computing, as it melds with the Internet of Things (IoT), needs a different approach to data processing and data storage. Data generated at source has to be processed at source, to respond to the event or events which have happened. Cloud Computing, even with 5G networks, has latency that is not sufficient to how an autonomous vehicle react to pedestrians on the road at speed or how a sprinkler system is activated in a fire, or even a fraud detection system to signal money laundering activities as they occur.

Furthermore, not all sensors, devices, and IoT end-points are connected to the cloud at all times. To understand this new way of data processing and data storage, have a look at this video by Jay Kreps, CEO of Confluent for Kafka® to view this new perspective.

Data is continuously and infinitely generated at source, and this data has to be compiled, controlled and consolidated with nanosecond precision. At Storage Field Day 19, an interesting open source project, Pravega, was introduced to the delegates by DellEMC. Pravega is an open source storage framework for streaming data and is part of Project Nautilus.

Rise of  streaming time series Data

Processing data at source has a lot of advantages and this has popularized Time Series analytics. Many time series and streams-based databases such as InfluxDB, TimescaleDB, OpenTSDB have sprouted over the years, along with open source projects such as Apache Kafka®, Apache Flink and Apache Druid.

The data generated at source (end-points, sensors, devices) is serialized, timestamped (as event occurs), continuous and infinite. These are the properties of a time series data stream, and to make sense of the streaming data, new data formats such as Avro, Parquet, Orc pepper the landscape along with the more mature JSON and XML, each with its own strengths and weaknesses.

You can learn more about these data formats in the 2 links below:

DIY is difficult

Many time series projects started as DIY projects in many organizations. And many of them are still DIY projects in production systems as well. They depend on tribal knowledge, and these databases are tied to an unmanaged storage which is not congruent to the properties of streaming data.

At the storage end, the technologies today still rely on the SAN and NAS protocols, and in recent years, S3, with object storage. Block, file and object storage introduce layers of abstraction which may not be a good fit for streaming data.

Continue reading