All the Sources and Sinks going to Object Storage

The vocabulary of sources and sinks are beginning to appear in the world of data storage as we witness the new addition of data processing frameworks and the applications in this space. I wrote about this in my blog “Rethinking data. processing frameworks systems in real time” a few months ago, introducing my take on this budding new set of I/O characteristics and data ecosystem. I also started learning about the Kappa Architecture (and Lambda as well), a framework designed to craft and develop a set of amalgamated technologies to handle stream processing of a series of data in relation to time.

In Computer Science, sources and sinks are considered external entities that often serve as connectors of input and output of disparate systems. They are often not in the purview of data storage architects. Also often, these sources and sinks are viewed as black boxes, and their inner workings are hidden from the views of the data storage architects.

Diagram from https://developer.here.com/documentation/get-started/dev_guide/shared_content/topics/olp/concepts/pipelines.html

The changing facade of data stream processing presents the constant motion of data, the continuous data being altered as it passes through the many integrated sources and sinks. We are also see much of the data processed in-memory as much as possible. Thus, the data services from a traditional storage model of SAN and NAS may straggle with the requirements demanded by this new generation of data stream processing.

As the world of traditional data storage processing is expanding into data streams processing and vice versa, and the chatter of sources and sinks can no longer be ignored.

Continue reading

DellEMC Project Nautilus Re-imagine Storage for Streams

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at this event. The content of this blog is of my own opinions and views ]

Cloud computing will have challenges processing data at the outer reach of its tentacles. Edge Computing, as it melds with the Internet of Things (IoT), needs a different approach to data processing and data storage. Data generated at source has to be processed at source, to respond to the event or events which have happened. Cloud Computing, even with 5G networks, has latency that is not sufficient to how an autonomous vehicle react to pedestrians on the road at speed or how a sprinkler system is activated in a fire, or even a fraud detection system to signal money laundering activities as they occur.

Furthermore, not all sensors, devices, and IoT end-points are connected to the cloud at all times. To understand this new way of data processing and data storage, have a look at this video by Jay Kreps, CEO of Confluent for Kafka® to view this new perspective.

Data is continuously and infinitely generated at source, and this data has to be compiled, controlled and consolidated with nanosecond precision. At Storage Field Day 19, an interesting open source project, Pravega, was introduced to the delegates by DellEMC. Pravega is an open source storage framework for streaming data and is part of Project Nautilus.

Rise of  streaming time series Data

Processing data at source has a lot of advantages and this has popularized Time Series analytics. Many time series and streams-based databases such as InfluxDB, TimescaleDB, OpenTSDB have sprouted over the years, along with open source projects such as Apache Kafka®, Apache Flink and Apache Druid.

The data generated at source (end-points, sensors, devices) is serialized, timestamped (as event occurs), continuous and infinite. These are the properties of a time series data stream, and to make sense of the streaming data, new data formats such as Avro, Parquet, Orc pepper the landscape along with the more mature JSON and XML, each with its own strengths and weaknesses.

You can learn more about these data formats in the 2 links below:

DIY is difficult

Many time series projects started as DIY projects in many organizations. And many of them are still DIY projects in production systems as well. They depend on tribal knowledge, and these databases are tied to an unmanaged storage which is not congruent to the properties of streaming data.

At the storage end, the technologies today still rely on the SAN and NAS protocols, and in recent years, S3, with object storage. Block, file and object storage introduce layers of abstraction which may not be a good fit for streaming data.

Continue reading

Thinking small to solve Big

[This article was posted in my LinkedIn at https://www.linkedin.com/pulse/thinking-small-solve-big-chin-fah-heoh/ on Sep 9th 2019]

The world’s economy has certainly turned. And organizations, especially the SMEs, are demanding more. There were times that many technology vendors and their tier 1 systems integrators could get away with plenty of high level hobnobbing, and showering the prospect with their marketing wow-factor. But those fancy, smancy days are drying up and SMEs now do a lot of research and demand a more elaborate and a more comprehensive technology solution to their requirements.

The SMEs have the same problems faced by the larger organizations. They want more data stored, protected and recoverable, and maximize the value of data. However, their risk factors are much higher than the larger enterprises, because a disruption or a simple breakdown could affect their business and operations far greater than larger organizations. In most situations, they have no safety net.

So, the past 3 odd years, I have learned that as a technology solution provider, as a systems integrator to SMEs, I have to be on-the-ball with their pains all the time. And I have to always remember that they do not have the deep pockets, especially when the economy in Malaysia has been soft for years.

That is why I have gravitated to technology solutions that matter to the SMEs and gentle to their pockets as well. Take for instance a small company called Itxotic I discovered earlier this year. Itxotic is a 100% Malaysian home-grown technology startup, focusing on customized industry intelligence, notably computer vision AI. Their prominent technology include defect detection in a manufacturing production line.

 

At the Enterprise level, it is easy for large technology providers like Hitachi or GE or Siemens to peddle similar high-tech solutions to SMEs requirements. But this would come with a price tag of hundreds of thousands of ringgit. SMEs will balk at such a large investment because the price tag is definitely something not comprehensible to the SME factories. That is why I gravitated to the small thinking of Itxotic, where their small, yet powerful technology solves big problems in the SMEs.

And this came about when more Industry 4.0 opportunities started to come into my radar. Similarly, I was also approached to look into a edge-network data analytics technology to be integrated into PLCs (programmable logic controllers). At present, the industry consultants who invited me, are peddling a foreign technology solution, and the technology costs RM13,000 per CPU core. In a typical 4-core processor IPC (industrial PC), that is a whopping RM52,000, minus the hardware and integration services. This can easily drive up the selling price of over RM100K, again, a price tag that will trigger a mini heart attack with the SMEs.

I am tasked by the industry consultants to design a more cost-friendly, aka cheaper solution and today, we are already building an alternative with Apache Kafka, its connectors and Grafana for visual reporting. And I think the cost to build this alternative technology will be probably 70-80% cheaper than the one they are reselling now. The “think small, solve Big” mantra is beginning to take hold, and I am excited about it.

In the “small” mantra, I mean to be intimate and humble with the end users. One lesson I have learned over the past years is, the SMEs count on their technology partners to be with them. They have no room for failure because a costly failure is likely to be devastating to their operations and business. Know the technology you are pitching well, so that the SMEs are confident that you can deliver, not some over-the-top high-level technology pitch. Look deep into the technology integration with their existing technology and operations, and carefully and meticulously craft and curate a well mapped plan for them. Commit to their journey to ensure their success.

I have often seen technology vendors and resellers leaving SMEs high and dry when it comes to something outside their scope, and this has been painful. That is why this isn’t a downgrade for me when I started working with the SMEs more often in the past 3 years, even though I have served the enterprise for more than 25 years. This invaluable lesson is an upgrade for me to serve my SME customers better.

Continue reading