Storage Performance Considerations for AI Data Paths

The hype of Deep Learning (DL), Machine Learning (ML) and Artificial Intelligence (AI) has reached an unprecedented frenzy. Every infrastructure vendor from servers, to networking, to storage has a word to say or play about DL/ML/AI. This prompted me to explore this hyped ecosystem from a storage perspective, notably from a storage performance requirement point-of-view.

One question on my mind

There are plenty of questions on my mind. One stood out and that is related to storage performance requirements.

Reading and learning from one storage technology vendor to another, the context of everyone’s play against their competitors seems to be  “They are archaic, they are legacy. Our architecture is built from ground up, modern, NVMe-enabled“. And there are more juxtaposing, but you get the picture – “We are better, no doubt“.

Are the data patterns and behaviours of AI different? How do they affect the storage design as the data moves through the workflow, the data paths and the lifecycle of the AI ecosystem?

Continue reading

The Heart of Digital Transformation is …

Businesses have taken up Digital Transformation in different ways and at different pace. In Malaysia, company boardrooms are accepting Digital Transformation as a core strategic initiative, crucial to develop competitive advantage in their respective industries. Time and time again, we are reminded that Data is the lifeblood and Data fuels the Digital Transformation initiatives.

The rise of CDOs

In line with the rise of the Digital Transformation buzzword, I have seen several unique job titles coming up since a few years ago. Among those titles, “Chief Digital Officer“, “Chief Data Officer“, “Chief Experience Officer” are some eye-catching ones. I have met a few of them, and so far, those I met were outward facing, customer facing. In most of my conversations with them respectively, they projected a front that their organization, their business and operations have been digital transformed. They are ready to help their customers to transform. Are they?

Tech vendors add more fuel

The technology vendors have an agenda to sell their solutions and their services. They paint aesthetically pleasing stories of how their solutions and wares can digitally transform any organizations, and customers latch on to these ‘shiny’ tech. End users get too fixated that technology is the core of Digital Transformation. They are wrong.

Missing the Forest

As I gather more insights through observations, and more conversations and more experiences, I think most of the “digital transformation ready” organizations are not adopting the right approach to Digital Transformation.

Digital Transformation is not tactical. It is not a one-time, big bang action that shifts from not-digitally-transformed to digitally-transformed in a moment. It is not a sprint. It is a marathon. It is a journey that will take time to mature. IDC and its Digital Transformation MaturityScape Framework is spot-on when they first released the framework years ago.

IDC Digital Transformation Maturityscape

Continue reading

Whither HPC, HPE?

HPE is acquiring Cray Inc. Almost 3 years ago, HPE acquired SGI. Back in 2017, HPE partnered WekaIO, and invested big in the latest Series C funding of WekaIO just weeks ago.

Cray, SGI and WekaIO are all strong HPC technology companies. Given the strong uptick in the HPC market, especially commercial HPC, we cannot deny HPE’s ambition to become the top SuperComputing and HPC vendor in the industry. Continue reading

Scaling new HPC with Composable Architecture

[Disclosure: I was invited by Dell Technologies as a delegate to their Dell Technologies World 2019 Conference from Apr 29-May 1, 2019 in the Las Vegas USA. Tech Field Day Extra was an included activity as part of the Dell Technologies World. My expenses, travel, accommodation and conference fees were covered by Dell Technologies, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

Deep Learning, Neural Networks, Machine Learning and subsequently Artificial Intelligence (AI) are the new generation of applications and workloads to the commercial HPC systems. Different from the traditional, more scientific and engineering HPC workloads, I have written about the new dawn of supercomputing and the attractive posture of commercial HPC.

Don’t be idle

From the business perspective, the investment of HPC systems is high most of the time, and justifying it to the executives and the investors is not easy. Therefore, it is critical to keep feeding the HPC systems and significantly minimize the idle times for compute, GPUs, network and storage.

However, almost all HPC systems today are inflexible. Once assigned to a project, the resources pretty much stay with the project, even when the workload processing of the project is idle and waiting. Of course, we have to bear in mind that not all resources are fully abstracted, virtualized and software-defined whereby you can carve out pieces of the hardware and deliver a percentage of that resource. Case in point is the CPU, where you cannot assign certain clock cycles of CPU to one project and another half to the other. The technology isn’t there yet. Certain resources like GPU is going down the path of Virtual GPU, and into the realm of resource disaggregation. Eventually, all resources of the HPC systems – CPU, memory, FPGA, GPU, PCIe channels, NVMe paths, IOPS, bandwidth, burst buffers etc – should be disaggregated and pooled for disparate applications and workloads based on demands of usage, time and performance.

Hence we are beginning to see the disaggregated HPC systems resources composed and built up the meet the diverse mix and needs of HPC applications and workloads. This is even more acute when a AI project might grow cold, but the training of AL/ML/DL workloads continues to stay hot

Liqid the early leader in Composable Architecture

Continue reading

Connecting ideas and people with Dell Influencers

[Disclosure: I was invited by Dell Technologies as a delegate to their Dell Technologies World 2019 Conference from Apr 29-May 1, 2019 in the Las Vegas USA. My expenses, travel, accommodation and conference fees were covered by Dell Technologies, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

I just got home from Vegas yesterday after attending my 2nd Dell Technologies World as one of the Dell Luminaries. The conference was definitely a bigger one than the one last year, with more than 15,000 attendees. And there was a frenzy of announcements, from Dell Technologies Cloud to new infrastructure solutions, and more. The big one for me, obviously was Azure VMware Solutions officiated by Microsoft CEO Satya Nadella and VMware CEO Pat Gelsinger, with Michael Dell bringing together the union. I blogged about Dell jumping into the cloud in a big way.

AI Tweetup

In the razzmatazz, the most memorable moments were one of the Tweetups organized by Dr. Konstanze Alex (Konnie) and her team, and Tech Field Day Extra.

Tweetup was alien to me. I didn’t know how the concept work and I did google tweetup before that. There were a few tweetups on the topics of data protection and 5G, but the one that stood out for me was the AI tweetup.

No alt text provided for this image

Continue reading

Is AI my friend?

I am sorry, Dave …

Let’s start this story with 2 supposed friends – Dave and Hal.

How do we become friends?

We have friends and we have enemies. We become friends when trust is established. Trust is established when there is an unsaid pact, a silent agreement that I can rely on you to keep my secrets private. I will know full well that you will protect my personal details with a strong conviction. Your decisions and your actions towards me are in my best interest, unbiased and would benefit both me and you.

I feel secure with you.

AI is my friend

When the walls of uncertainty and falsehood are broken down, we trust our friends more and more. We share deeper secrets with our friends when we believe that our privacy and safety are safeguarded and protected. We know well that we can rely on them and their decisions and actions on us are reliable and unbiased.

AI, can I count on you to protect my privacy and give me security that my personal data is not abused in the hands of the privileged few?

AI, can I rely on you to be ethical, unbiased and give me the confidence that your decisions and actions are for the benefit and the good of me, myself and I?

My AI friends (maybe)

As I have said before, I am not a skeptic. When there is plenty of relevant, unbiased data fed into the algorithms of AI, the decisions are fair. People accept these AI decisions when the degree of accuracy is very close to the Truth. The higher the accuracy, the greater the Truth. The greater the Truth, the more confident people are towards the AI system.

Here are some AI “friends” in the news:

But we have to careful here as well. Accuracy can be subjective, paradoxical and enigmatic. When ethics are violated, we terminate the friendship and we reject the “friend”. We categorically label him or her as an enemy. We constantly have to check, just like we might, once in a while, investigate on our friends too.

In Conclusion

AI, can we be friends now?

[Apology: sorry about the Cyberdyne link 😉 ]

[This blog was posted in LinkedIn on Apr 19th 2019]

Data Privacy First before AI Framework

A few days ago, I discovered that Malaysia already had plans for a National Artificial Intelligence (AI) Framework. It is led by Malaysia Digital Economy Corporation (MDEC) and it will be ready by the end of 2019. A Google search revealed a lot news and announcements, with a few dating back to 2017, but little information of the framework itself. Then again, Malaysia likes to take the “father knows best” approach, and assumes that what it is doing shouldn’t be questioned (much). I will leave this part as it is, because perhaps the details of the framework is under the OSA (Official Secrets Act).

Are we AI responsible or are we responsible for AI?

But I would like to highlight the data privacy part that is likely to figure strongly in the AI Framework, because the ethical use of AI is paramount. It will have economical, social and political impact on Malaysians, and everybody else too. I have written a few articles on LinkedIn about ethics, data privacy, data responsibility, impact of AI. You can read about them in the links below:

I may sound like a skeptic of AI. I am not. I believe AI will benefit mankind, and bring far reaching developments to the society as a whole. But we have to careful and this is my MAIN concern when I voice about AI. I continue to question the human ethics and the human biases that go into the algorithms that define AI. This has always been the crux of my gripes, my concerns, my skepticism of everything we call AI. I am not against AI but I am against the human flaws that shape the algorithms of AI.

Everything is a Sheep (or a Giraffe)

A funny story was shared with me last year. It was about Microsoft Azure computer vision algorithm in recognizing visuals in photos. Apparently the algorithm of the Microsoft Azure’s neural network was fed with some overzealous data of sheep (or giraffes), and the AI system started to point out that every spot that it “saw” was either a sheep, or any vertical long ones was a giraffe.

In the photo below, there were a bunch of sheep on a tree. Check out the tags/comments in the red rectangle published by the AI neural network software below and see how both Microsoft Azure and NeutralTalk2 “saw” in the photo. You can read more about the funny story here.

This proves my point that if you feed the learning system and the AI behind it with biased and flawed information, the result can be funny (in this case here) or disastrous. Continue reading

We got to keep more data

Guess which airport has won the most awards in the annual Skytrax list? Guess which airport won 480 awards since its opening in 1981? Guess how this airport did it?

Data Analytics gives the competive edge.

Serving and servicing more than 65 million passengers and travellers in 2018, and growing, Changi Airport Singapore sets a very high level customer service. And it does it with the help of technology, something they call Smart (Service Management through Analytics and Resource Transformation) Airport. In an ultra competitive and cut-throat airline business, the deep integration of customer-centric services and the ultimate traveller’s experience are crucial to the survival and growth of airlines. And it has definitely helped Singapore Airlines to be the world’s best airlines in 2018, its 4th win.

To achieve that, Changi Airport relies on technology and lots of relevant data for deep insights on how to serve its customers better. The details are well described in this old news article.

Keep More Relevant Data for Greater Insights

When I mean more data, I do not mean every single piece of data. Data has to be relevant to be useful.

How do we get more insights? How can we teach systems to learn? How to we develop artificial intelligence systems? By having more relevant data feeding into data analytics systems, machine learning and such.

As such, a simple framework for building from the data ingestion, to data repositories to outcomes such as artificial intelligence, predictive and recommendations systems, automation and new data insights isn’t difficult to understand. The diagram below is a high level overview of what I work with most of the time. Continue reading

The full force of Western Digital

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

3 weeks after Storage Field Day 18, I was still trying to wrap my head around the 3-hour session we had with Western Digital. I was like a kid in a candy store for a while, because there were too much to chew and I couldn’t munch them all.

From “Silicon to System”

Not many storage companies in the world can claim that mantra – “From Silicon to Systems“. Western Digital is probably one of 3 companies (the other 2 being Intel and nVidia) I know of at present, which develops vertical innovation and integration, end to end, from components, to platforms and to systems.

For a long time, we have always known Western Digital to be a hard disk company. It owns HGST, SanDisk, providing the drives, the Flash and the Compact Flash for both the consumer and the enterprise markets. However, in recent years, through 2 eyebrow raising acquisitions, Western Digital was moving itself up the infrastructure stack. In 2015, it acquired Amplidata. 2 years later, it acquired Tegile Systems. At that time, I was wondering why a hard disk manufacturer was buying storage technology companies that were not its usual bread and butter business.

Continue reading

WekaIO controls their performance destiny

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

I was first introduced to WekaIO back in Storage Field Day 15. I did not blog about them back then, but I have followed their progress quite attentively throughout 2018. 2 Storage Field Days and a year later, they were back for Storage Field Day 18 with a new CTO, Andy Watson, and several performance benchmark records.

Blowout year

2018 was a blowout year for WekaIO. They have experienced over 400% growth, placed #1 in the Virtual Institute IO-500 10-node performance challenge, and also became #1 in the SPEC SFS 2014 performance and latency benchmark. (Note: This record was broken by NetApp a few days later but at a higher cost per client)

The Virtual Institute for I/O IO-500 10-node performance challenge was particularly interesting, because it pitted WekaIO against Oak Ridge National Lab (ORNL) Summit supercomputer, and WekaIO won. Details of the challenge were listed in Blocks and Files and WekaIO Matrix Filesystem became the fastest parallel file system in the world to date.

Control, control and control

I studied WekaIO’s architecture prior to this Field Day. And I spent quite a bit of time digesting and understanding their data paths, I/O paths and control paths, in particular, the diagram below:

Starting from the top right corner of the diagram, applications on the Linux client (running Weka Client software) and it presents to the Linux client as a POSIX-compliant file system. Through the network, the Linux client interacts with the WekaIO kernel-based VFS (virtual file system) driver which coordinates the Front End (grey box in upper right corner) to the Linux client. Other client-based protocols such as NFS, SMB, S3 and HDFS are also supported. The Front End then interacts with the NIC (which can be 10/100G Ethernet, Infiniband, and NVMeoF) through SR-IOV (single root IO virtualization), bypassing the Linux kernel for maximum throughput. This is with WekaIO’s own networking stack in user space. Continue reading