Storage Performance Considerations for AI Data Paths

The hype of Deep Learning (DL), Machine Learning (ML) and Artificial Intelligence (AI) has reached an unprecedented frenzy. Every infrastructure vendor from servers, to networking, to storage has a word to say or play about DL/ML/AI. This prompted me to explore this hyped ecosystem from a storage perspective, notably from a storage performance requirement point-of-view.

One question on my mind

There are plenty of questions on my mind. One stood out and that is related to storage performance requirements.

Reading and learning from one storage technology vendor to another, the context of everyone’s play against their competitors seems to be  “They are archaic, they are legacy. Our architecture is built from ground up, modern, NVMe-enabled“. And there are more juxtaposing, but you get the picture – “We are better, no doubt“.

Are the data patterns and behaviours of AI different? How do they affect the storage design as the data moves through the workflow, the data paths and the lifecycle of the AI ecosystem?

Continue reading

Did Cloud Kill LTFS?

I like LTFS (Linear Tape File System). I was hoping it would take off but it has not. And looking at its future, its significance is becoming less and less relevant. I look if Cloud has been a factor in the possible demise of LTFS in the next few years.

What is LTFS?

In a nutshell, Linear Tape File System makes LTO tapes look like a disk with a file system. It takes a tape and divides it into 2 partitions:

  • Index Partition (XML Index Schema with file names, metadata and attributes details)
  • Data Partition (where the data resides)

Diagram from https://www.snia.org/sites/default/orig/SDC2011/presentations/tuesday/DavidPease_LinearTape_File_System.pdf

It has a File System module which is implemented in supported OS of Unix/Linux, MacOS and Windows. And the mounted file system “tape partition” shows up as a drive or device.

Assassination attempts

There were many attempts to kill off tapes and so far, none has been successful.

Among the “tape-killer” technologies, I think the most prominent one is the VTL (Virtual Tape Library). There were many VTLs I encountered during my days in mid-2000s. NetApp had Alacritus and EMC had Clariion Disk Libraries. There were also IBM ProtecTIER, FalconStor VTL (which is still selling today) among others and Sepaton (read in reverse is “No Tapes’). Sepaton was acquired by Hitachi Data Systems several years back. Continue reading

Is AI my friend?

I am sorry, Dave …

Let’s start this story with 2 supposed friends – Dave and Hal.

How do we become friends?

We have friends and we have enemies. We become friends when trust is established. Trust is established when there is an unsaid pact, a silent agreement that I can rely on you to keep my secrets private. I will know full well that you will protect my personal details with a strong conviction. Your decisions and your actions towards me are in my best interest, unbiased and would benefit both me and you.

I feel secure with you.

AI is my friend

When the walls of uncertainty and falsehood are broken down, we trust our friends more and more. We share deeper secrets with our friends when we believe that our privacy and safety are safeguarded and protected. We know well that we can rely on them and their decisions and actions on us are reliable and unbiased.

AI, can I count on you to protect my privacy and give me security that my personal data is not abused in the hands of the privileged few?

AI, can I rely on you to be ethical, unbiased and give me the confidence that your decisions and actions are for the benefit and the good of me, myself and I?

My AI friends (maybe)

As I have said before, I am not a skeptic. When there is plenty of relevant, unbiased data fed into the algorithms of AI, the decisions are fair. People accept these AI decisions when the degree of accuracy is very close to the Truth. The higher the accuracy, the greater the Truth. The greater the Truth, the more confident people are towards the AI system.

Here are some AI “friends” in the news:

But we have to careful here as well. Accuracy can be subjective, paradoxical and enigmatic. When ethics are violated, we terminate the friendship and we reject the “friend”. We categorically label him or her as an enemy. We constantly have to check, just like we might, once in a while, investigate on our friends too.

In Conclusion

AI, can we be friends now?

[Apology: sorry about the Cyberdyne link 😉 ]

[This blog was posted in LinkedIn on Apr 19th 2019]

Figuring out storage for Kubernetes and containers

Oops! I forgot about you!

To me, containers and container orchestration (CO) engines such as Kubernetes, Mesos, Docker Swarm are fantastic. They scale effortlessly and are truly designed for cloud native applications (CNA).

But one thing irks me. Storage management for containers and COs. It was as if when they designed and constructed containers and the containers orchestration (CO) engines, they forgot about the considerations of storage and storage management. At least the persistent part of storage.

Over a year ago, I was in two minds about persistent storage, especially when it comes to the transient nature of microservices which was so prevalent and were inundating the cloud native applications landscape. I was searching for answers in my blog. The decentralization of microservices in containers means mass deployment at the edge, but to have the pre-processed and post-processed data stick to the persistent storage at the edge device is a challenge. The operative word here is “STICK”.

Two different worlds

Containers were initially designed and built for lightweight applications such as microservices. The runtime, libraries, configuration files and dependencies are all in one package. They were meant to do simple tasks quickly and scales to thousands easily. They could be brought up and brought down in little time and did not have to bother about the persistent data stored by the host. The state of the containers were also not important to the application tasks at hand.

Today containers like Docker have matured to run enterprise applications and the state of the container is important. The applications must know the state and the health of the container. The container could be in online mode, online but not accepting data mode, suspended mode, paused mode, interrupted mode, quiesced mode or halted mode. Each mode or state of the container is important to the running applications and the container can easily brought up or down in an instance of a command. The stateful nature of the containers and applications is critical for the business. The same situation applies to container orchestration engines such as Kubernetes.

Container and Kubernetes Storage

Docker provides 3 methods to local storage. In the diagram below, it describes:

Continue reading

Data Privacy First before AI Framework

A few days ago, I discovered that Malaysia already had plans for a National Artificial Intelligence (AI) Framework. It is led by Malaysia Digital Economy Corporation (MDEC) and it will be ready by the end of 2019. A Google search revealed a lot news and announcements, with a few dating back to 2017, but little information of the framework itself. Then again, Malaysia likes to take the “father knows best” approach, and assumes that what it is doing shouldn’t be questioned (much). I will leave this part as it is, because perhaps the details of the framework is under the OSA (Official Secrets Act).

Are we AI responsible or are we responsible for AI?

But I would like to highlight the data privacy part that is likely to figure strongly in the AI Framework, because the ethical use of AI is paramount. It will have economical, social and political impact on Malaysians, and everybody else too. I have written a few articles on LinkedIn about ethics, data privacy, data responsibility, impact of AI. You can read about them in the links below:

I may sound like a skeptic of AI. I am not. I believe AI will benefit mankind, and bring far reaching developments to the society as a whole. But we have to careful and this is my MAIN concern when I voice about AI. I continue to question the human ethics and the human biases that go into the algorithms that define AI. This has always been the crux of my gripes, my concerns, my skepticism of everything we call AI. I am not against AI but I am against the human flaws that shape the algorithms of AI.

Everything is a Sheep (or a Giraffe)

A funny story was shared with me last year. It was about Microsoft Azure computer vision algorithm in recognizing visuals in photos. Apparently the algorithm of the Microsoft Azure’s neural network was fed with some overzealous data of sheep (or giraffes), and the AI system started to point out that every spot that it “saw” was either a sheep, or any vertical long ones was a giraffe.

In the photo below, there were a bunch of sheep on a tree. Check out the tags/comments in the red rectangle published by the AI neural network software below and see how both Microsoft Azure and NeutralTalk2 “saw” in the photo. You can read more about the funny story here.

This proves my point that if you feed the learning system and the AI behind it with biased and flawed information, the result can be funny (in this case here) or disastrous. Continue reading

We got to keep more data

Guess which airport has won the most awards in the annual Skytrax list? Guess which airport won 480 awards since its opening in 1981? Guess how this airport did it?

Data Analytics gives the competive edge.

Serving and servicing more than 65 million passengers and travellers in 2018, and growing, Changi Airport Singapore sets a very high level customer service. And it does it with the help of technology, something they call Smart (Service Management through Analytics and Resource Transformation) Airport. In an ultra competitive and cut-throat airline business, the deep integration of customer-centric services and the ultimate traveller’s experience are crucial to the survival and growth of airlines. And it has definitely helped Singapore Airlines to be the world’s best airlines in 2018, its 4th win.

To achieve that, Changi Airport relies on technology and lots of relevant data for deep insights on how to serve its customers better. The details are well described in this old news article.

Keep More Relevant Data for Greater Insights

When I mean more data, I do not mean every single piece of data. Data has to be relevant to be useful.

How do we get more insights? How can we teach systems to learn? How to we develop artificial intelligence systems? By having more relevant data feeding into data analytics systems, machine learning and such.

As such, a simple framework for building from the data ingestion, to data repositories to outcomes such as artificial intelligence, predictive and recommendations systems, automation and new data insights isn’t difficult to understand. The diagram below is a high level overview of what I work with most of the time. Continue reading

Malaysia, when will you take data privacy seriously?

It is sad. I get about 5-10 silly calls a week and a bunch of nonsense messages in my WhatsApp text and SMS. They waste my time, and it has been going on for years. Even worse is that my private details are out there, exposed and likely be abused too.

Once I got a call from a municipal attorney in the state of Kelantan that I have unpaid summons of several thousand ringgit. They have phone number, my IC number and they threatened to send me a note to arrest me if I didn’t pay up. The thing is, I have never been to Kelantan and I challenged them to send the attorney letter to my home address. The guy on the phone hung up.

In this age where digital information is there at our finger tips, the private details of victims are out there, easily used for unsavoury gains. And we as Malaysians should not shrug our shoulders and not assume that everything is like that, as if it is a Malaysia way of life. That apathy, our state of indifference, should be wiped out from our attitude. We should question the government, the agencies of why is our privacy not protected?

We have the Personal Data Protection Act, ratified in 2010. I don’t know the details of the act, but in its most basic form, don’t you think our private details should at least be protected from the telemarketers calling us selling their personal loans, time share travel suites, private massage (with benefits?) and other silly stuff? How can an act, as a law, be so toothless? Why bother drafting the act, and going through multiple iterations, I would suppose, and making it a law, and yet remain so unworthy to be called a act? Continue reading

The full force of Western Digital

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

3 weeks after Storage Field Day 18, I was still trying to wrap my head around the 3-hour session we had with Western Digital. I was like a kid in a candy store for a while, because there were too much to chew and I couldn’t munch them all.

From “Silicon to System”

Not many storage companies in the world can claim that mantra – “From Silicon to Systems“. Western Digital is probably one of 3 companies (the other 2 being Intel and nVidia) I know of at present, which develops vertical innovation and integration, end to end, from components, to platforms and to systems.

For a long time, we have always known Western Digital to be a hard disk company. It owns HGST, SanDisk, providing the drives, the Flash and the Compact Flash for both the consumer and the enterprise markets. However, in recent years, through 2 eyebrow raising acquisitions, Western Digital was moving itself up the infrastructure stack. In 2015, it acquired Amplidata. 2 years later, it acquired Tegile Systems. At that time, I was wondering why a hard disk manufacturer was buying storage technology companies that were not its usual bread and butter business.

Continue reading

Clever Cohesity

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

This is clever. This is very smart.

The moment the Cohesity App Marketplace pitch was shared at the Storage Field Day 18 session, somewhere in my mind, enlightenment came to me.

The hyperconverged platform for secondary data, or is it?

When Cohesity came into the scene, they were branded the latest unicorn alongside Rubrik. Both were gunning for the top hyperconverged platform for secondary data. Crazy money was pouring into that segment – Cohesity got USD250 million in June 2018; Rubrik received USD261 million in Jan 2019 – making the market for hyperconverged platforms for secondary data red-hot. Continue reading

VAST Data must be something special

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

Vast Data coming out bash!

The delegates of Storage Field Days were always the lucky bunch. We have witnessed several storage technology companies coming out of stealth at these Tech Field Days. The recent ones in memory for me were Excelero and Hammerspace. But to have one where the venerable storage doyen, Mr. Howard Marks, Vast Data new tech evangelist, to introduce the deep dive of Vast Data technology was something special.

For those who knew Howard, he is fiercely independent, very storage technology smart, opinionated and not easily impressed. As a storage technology connoisseur myself, I believe Howard must have seen something special in Vast Data. They must be doing something extremely unique and impressive that someone like Howard could not resist, and made him jump to the vendor side. This sets the tone of my blog.

Continue reading