Down the rabbit hole with Kubernetes Storage

Kubernetes is on fire. Last week VMware® released the State of Kubernetes 2020 report which surveyed companies with 1,000 employees and above. Results were not surprising as the adoptions of this nascent technology are booming. But persistent storage remained the nagging concern for the Kubernetes serving the infrastructure resources to applications instances running in the containers of a pod in a cluster.

The standardization of storage resources have settled with CSI (Container Storage Interface). Storage vendors have almost, kind of, sort of agreed that the API objects such as PersistentVolumes, PersistentVolumeClaims, StorageClasses, along with the parameters would be the way to request the storage resources from the Pre-provisioned Volumes via the CSI driver plug-in. There are already more than 50 vendor specific CSI drivers in Github.

Kubernetes and CSI initiative

Kubernetes and the CSI (Container Storage Interface) logos

The CSI plug-in method is the only way for Kubernetes to scale and keep its dynamic, loadable storage resource integration with external 3rd party vendors, all clamouring to grab a piece of this burgeoning demands both in the cloud and in the enterprise.

Continue reading

DellEMC Project Nautilus Re-imagine Storage for Streams

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at this event. The content of this blog is of my own opinions and views ]

Cloud computing will have challenges processing data at the outer reach of its tentacles. Edge Computing, as it melds with the Internet of Things (IoT), needs a different approach to data processing and data storage. Data generated at source has to be processed at source, to respond to the event or events which have happened. Cloud Computing, even with 5G networks, has latency that is not sufficient to how an autonomous vehicle react to pedestrians on the road at speed or how a sprinkler system is activated in a fire, or even a fraud detection system to signal money laundering activities as they occur.

Furthermore, not all sensors, devices, and IoT end-points are connected to the cloud at all times. To understand this new way of data processing and data storage, have a look at this video by Jay Kreps, CEO of Confluent for Kafka® to view this new perspective.

Data is continuously and infinitely generated at source, and this data has to be compiled, controlled and consolidated with nanosecond precision. At Storage Field Day 19, an interesting open source project, Pravega, was introduced to the delegates by DellEMC. Pravega is an open source storage framework for streaming data and is part of Project Nautilus.

Rise of  streaming time series Data

Processing data at source has a lot of advantages and this has popularized Time Series analytics. Many time series and streams-based databases such as InfluxDB, TimescaleDB, OpenTSDB have sprouted over the years, along with open source projects such as Apache Kafka®, Apache Flink and Apache Druid.

The data generated at source (end-points, sensors, devices) is serialized, timestamped (as event occurs), continuous and infinite. These are the properties of a time series data stream, and to make sense of the streaming data, new data formats such as Avro, Parquet, Orc pepper the landscape along with the more mature JSON and XML, each with its own strengths and weaknesses.

You can learn more about these data formats in the 2 links below:

DIY is difficult

Many time series projects started as DIY projects in many organizations. And many of them are still DIY projects in production systems as well. They depend on tribal knowledge, and these databases are tied to an unmanaged storage which is not congruent to the properties of streaming data.

At the storage end, the technologies today still rely on the SAN and NAS protocols, and in recent years, S3, with object storage. Block, file and object storage introduce layers of abstraction which may not be a good fit for streaming data.

Continue reading

AI needs data we can trust

[ Note: This article was published on LinkedIn on Jan 21th 2020. Here is the link to the original article ]

In 2020, the intensity on the topic of Artificial Intelligence will further escalate.

One news which came out last week terrified me. The Sarawak courts want to apply Artificial Intelligence to mete judgment and punishment, perhaps on a small scale.

Continue reading

NAS is the next Ransomware goldmine

I get an email like this almost every day:

It is from one of my FreeNAS customers daily security run logs, emailed to our support@katanalogic.com alias. It is attempting a brute force attack trying to crack the authentication barrier via the exposed SSH port.

Just days after the installation was completed months ago, a bot has been doing IP port scans on our system, and found the SSH port open. (We used it for remote support). It has been trying every since, and we have been observing the source IP addresses.

The new Ransomware attack vector

This is not surprising to me. Ransomware has become more sophisticated and more damaging than ever because the monetary returns from the ransomware are far more effective and lucrative than other cybersecurity threats so far. And the easiest preys are the weakest link in the People, Process and Technology chain. Phishing breaches through social engineering, emails are the most common attack vectors, but there are vhishing (via voicemail) and smshing (via SMS) out there too. Of course, we do not discount other attack vectors such as mal-advertising sites, or exploits and so on. Anything to deliver the ransomware payload.

The new attack vector via NAS (Network Attached Storage) and it is easy to understand why.

Continue reading

Thinking small to solve Big

[This article was posted in my LinkedIn at https://www.linkedin.com/pulse/thinking-small-solve-big-chin-fah-heoh/ on Sep 9th 2019]

The world’s economy has certainly turned. And organizations, especially the SMEs, are demanding more. There were times that many technology vendors and their tier 1 systems integrators could get away with plenty of high level hobnobbing, and showering the prospect with their marketing wow-factor. But those fancy, smancy days are drying up and SMEs now do a lot of research and demand a more elaborate and a more comprehensive technology solution to their requirements.

The SMEs have the same problems faced by the larger organizations. They want more data stored, protected and recoverable, and maximize the value of data. However, their risk factors are much higher than the larger enterprises, because a disruption or a simple breakdown could affect their business and operations far greater than larger organizations. In most situations, they have no safety net.

So, the past 3 odd years, I have learned that as a technology solution provider, as a systems integrator to SMEs, I have to be on-the-ball with their pains all the time. And I have to always remember that they do not have the deep pockets, especially when the economy in Malaysia has been soft for years.

That is why I have gravitated to technology solutions that matter to the SMEs and gentle to their pockets as well. Take for instance a small company called Itxotic I discovered earlier this year. Itxotic is a 100% Malaysian home-grown technology startup, focusing on customized industry intelligence, notably computer vision AI. Their prominent technology include defect detection in a manufacturing production line.

 

At the Enterprise level, it is easy for large technology providers like Hitachi or GE or Siemens to peddle similar high-tech solutions to SMEs requirements. But this would come with a price tag of hundreds of thousands of ringgit. SMEs will balk at such a large investment because the price tag is definitely something not comprehensible to the SME factories. That is why I gravitated to the small thinking of Itxotic, where their small, yet powerful technology solves big problems in the SMEs.

And this came about when more Industry 4.0 opportunities started to come into my radar. Similarly, I was also approached to look into a edge-network data analytics technology to be integrated into PLCs (programmable logic controllers). At present, the industry consultants who invited me, are peddling a foreign technology solution, and the technology costs RM13,000 per CPU core. In a typical 4-core processor IPC (industrial PC), that is a whopping RM52,000, minus the hardware and integration services. This can easily drive up the selling price of over RM100K, again, a price tag that will trigger a mini heart attack with the SMEs.

I am tasked by the industry consultants to design a more cost-friendly, aka cheaper solution and today, we are already building an alternative with Apache Kafka, its connectors and Grafana for visual reporting. And I think the cost to build this alternative technology will be probably 70-80% cheaper than the one they are reselling now. The “think small, solve Big” mantra is beginning to take hold, and I am excited about it.

In the “small” mantra, I mean to be intimate and humble with the end users. One lesson I have learned over the past years is, the SMEs count on their technology partners to be with them. They have no room for failure because a costly failure is likely to be devastating to their operations and business. Know the technology you are pitching well, so that the SMEs are confident that you can deliver, not some over-the-top high-level technology pitch. Look deep into the technology integration with their existing technology and operations, and carefully and meticulously craft and curate a well mapped plan for them. Commit to their journey to ensure their success.

I have often seen technology vendors and resellers leaving SMEs high and dry when it comes to something outside their scope, and this has been painful. That is why this isn’t a downgrade for me when I started working with the SMEs more often in the past 3 years, even though I have served the enterprise for more than 25 years. This invaluable lesson is an upgrade for me to serve my SME customers better.

Continue reading

Intel IoT Revolution for Malaysia Industry 4.0

Intel rocks!

I have been following Intel for a few years now, a big part was for their push of the 3D Xpoint technology. Under the Optane brand, Intel has several forms of media types, addressing persistent memory to storage class and solid state storage. Intel, in recent years, has been more forefront with their larger technology portfolio and it is not just about their processors anymore. One of the bright areas I am seeing myself getting more engrossed in (and involved into) is their IoT (Internet of Things) portfolio, and it has been very exciting so far.

Intel IoT and Deep Learning Frameworks

The efforts of the Intel IoTG (Internet of Things Group) in Asia Pacific are recognized rapidly. The drive of the Industry 4.0 revolution is strong. And I saw the brightest spark of the Intel folks pushing the Industry 4.0 message on homeground Malaysia.

After the large showing by Intel at the Semicon event 2 months ago, they turned up a notch in Penang at their own Intel IoT Summit 2019, which concluded last week.

At the event, Intel brought out their solid engineering geeks. There were plenty of talks and workshops on Deep Learning, AI, Neural Networks, with chatters on Nervana, Nauta and Saffron. Despite all the technology and engineering prowess of Intel was showcasing, there was a worrying gap.

Continue reading

Storage Performance Considerations for AI Data Paths

The hype of Deep Learning (DL), Machine Learning (ML) and Artificial Intelligence (AI) has reached an unprecedented frenzy. Every infrastructure vendor from servers, to networking, to storage has a word to say or play about DL/ML/AI. This prompted me to explore this hyped ecosystem from a storage perspective, notably from a storage performance requirement point-of-view.

One question on my mind

There are plenty of questions on my mind. One stood out and that is related to storage performance requirements.

Reading and learning from one storage technology vendor to another, the context of everyone’s play against their competitors seems to be  “They are archaic, they are legacy. Our architecture is built from ground up, modern, NVMe-enabled“. And there are more juxtaposing, but you get the picture – “We are better, no doubt“.

Are the data patterns and behaviours of AI different? How do they affect the storage design as the data moves through the workflow, the data paths and the lifecycle of the AI ecosystem?

Continue reading

The Heart of Digital Transformation is …

Businesses have taken up Digital Transformation in different ways and at different pace. In Malaysia, company boardrooms are accepting Digital Transformation as a core strategic initiative, crucial to develop competitive advantage in their respective industries. Time and time again, we are reminded that Data is the lifeblood and Data fuels the Digital Transformation initiatives.

The rise of CDOs

In line with the rise of the Digital Transformation buzzword, I have seen several unique job titles coming up since a few years ago. Among those titles, “Chief Digital Officer“, “Chief Data Officer“, “Chief Experience Officer” are some eye-catching ones. I have met a few of them, and so far, those I met were outward facing, customer facing. In most of my conversations with them respectively, they projected a front that their organization, their business and operations have been digital transformed. They are ready to help their customers to transform. Are they?

Tech vendors add more fuel

The technology vendors have an agenda to sell their solutions and their services. They paint aesthetically pleasing stories of how their solutions and wares can digitally transform any organizations, and customers latch on to these ‘shiny’ tech. End users get too fixated that technology is the core of Digital Transformation. They are wrong.

Missing the Forest

As I gather more insights through observations, and more conversations and more experiences, I think most of the “digital transformation ready” organizations are not adopting the right approach to Digital Transformation.

Digital Transformation is not tactical. It is not a one-time, big bang action that shifts from not-digitally-transformed to digitally-transformed in a moment. It is not a sprint. It is a marathon. It is a journey that will take time to mature. IDC and its Digital Transformation MaturityScape Framework is spot-on when they first released the framework years ago.

IDC Digital Transformation Maturityscape

Continue reading

Is AI my friend?

I am sorry, Dave …

Let’s start this story with 2 supposed friends – Dave and Hal.

How do we become friends?

We have friends and we have enemies. We become friends when trust is established. Trust is established when there is an unsaid pact, a silent agreement that I can rely on you to keep my secrets private. I will know full well that you will protect my personal details with a strong conviction. Your decisions and your actions towards me are in my best interest, unbiased and would benefit both me and you.

I feel secure with you.

AI is my friend

When the walls of uncertainty and falsehood are broken down, we trust our friends more and more. We share deeper secrets with our friends when we believe that our privacy and safety are safeguarded and protected. We know well that we can rely on them and their decisions and actions on us are reliable and unbiased.

AI, can I count on you to protect my privacy and give me security that my personal data is not abused in the hands of the privileged few?

AI, can I rely on you to be ethical, unbiased and give me the confidence that your decisions and actions are for the benefit and the good of me, myself and I?

My AI friends (maybe)

As I have said before, I am not a skeptic. When there is plenty of relevant, unbiased data fed into the algorithms of AI, the decisions are fair. People accept these AI decisions when the degree of accuracy is very close to the Truth. The higher the accuracy, the greater the Truth. The greater the Truth, the more confident people are towards the AI system.

Here are some AI “friends” in the news:

But we have to careful here as well. Accuracy can be subjective, paradoxical and enigmatic. When ethics are violated, we terminate the friendship and we reject the “friend”. We categorically label him or her as an enemy. We constantly have to check, just like we might, once in a while, investigate on our friends too.

In Conclusion

AI, can we be friends now?

[Apology: sorry about the Cyberdyne link 😉 ]

[This blog was posted in LinkedIn on Apr 19th 2019]

Data Privacy First before AI Framework

A few days ago, I discovered that Malaysia already had plans for a National Artificial Intelligence (AI) Framework. It is led by Malaysia Digital Economy Corporation (MDEC) and it will be ready by the end of 2019. A Google search revealed a lot news and announcements, with a few dating back to 2017, but little information of the framework itself. Then again, Malaysia likes to take the “father knows best” approach, and assumes that what it is doing shouldn’t be questioned (much). I will leave this part as it is, because perhaps the details of the framework is under the OSA (Official Secrets Act).

Are we AI responsible or are we responsible for AI?

But I would like to highlight the data privacy part that is likely to figure strongly in the AI Framework, because the ethical use of AI is paramount. It will have economical, social and political impact on Malaysians, and everybody else too. I have written a few articles on LinkedIn about ethics, data privacy, data responsibility, impact of AI. You can read about them in the links below:

I may sound like a skeptic of AI. I am not. I believe AI will benefit mankind, and bring far reaching developments to the society as a whole. But we have to careful and this is my MAIN concern when I voice about AI. I continue to question the human ethics and the human biases that go into the algorithms that define AI. This has always been the crux of my gripes, my concerns, my skepticism of everything we call AI. I am not against AI but I am against the human flaws that shape the algorithms of AI.

Everything is a Sheep (or a Giraffe)

A funny story was shared with me last year. It was about Microsoft Azure computer vision algorithm in recognizing visuals in photos. Apparently the algorithm of the Microsoft Azure’s neural network was fed with some overzealous data of sheep (or giraffes), and the AI system started to point out that every spot that it “saw” was either a sheep, or any vertical long ones was a giraffe.

In the photo below, there were a bunch of sheep on a tree. Check out the tags/comments in the red rectangle published by the AI neural network software below and see how both Microsoft Azure and NeutralTalk2 “saw” in the photo. You can read more about the funny story here.

This proves my point that if you feed the learning system and the AI behind it with biased and flawed information, the result can be funny (in this case here) or disastrous. Continue reading