Storage Gaga

Figuring out storage for Kubernetes and containers

By cfheoh | April 16, 2019 - 6:44 pm |April 16, 2019 Amazon Web Services, API, Cloud, Containers, Data, Data Archiving, Data Availability, Data Management, Data Protection, Data Security, Docker, Edge Computing, Elastifile, Fibre Channel, Filesystems, Google, Hyperconvergence, Kubernetes, NFS, Openstack, Performance Benchmark, Performance Caching, Robin.io, Snapshots, Software Defined Storage, Storage Optimization, Virtualization, VMware

2 Comments

Oops! I forgot about you!

To me, containers and container orchestration (CO) engines such as Kubernetes, Mesos, Docker Swarm are fantastic. They scale effortlessly and are truly designed for cloud native applications (CNA).

But one thing irks me. Storage management for containers and COs. It was as if when they designed and constructed containers and the containers orchestration (CO) engines, they forgot about the considerations of storage and storage management. At least the persistent part of storage.

Over a year ago, I was in two minds about persistent storage, especially when it comes to the transient nature of microservices which was so prevalent and were inundating the cloud native applications landscape. I was searching for answers in my blog. The decentralization of microservices in containers means mass deployment at the edge, but to have the pre-processed and post-processed data stick to the persistent storage at the edge device is a challenge. The operative word here is “STICK”.

Two different worlds

Containers were initially designed and built for lightweight applications such as microservices. The runtime, libraries, configuration files and dependencies are all in one package. They were meant to do simple tasks quickly and scales to thousands easily. They could be brought up and brought down in little time and did not have to bother about the persistent data stored by the host. The state of the containers were also not important to the application tasks at hand.

Today containers like Docker have matured to run enterprise applications and the state of the container is important. The applications must know the state and the health of the container. The container could be in online mode, online but not accepting data mode, suspended mode, paused mode, interrupted mode, quiesced mode or halted mode. Each mode or state of the container is important to the running applications and the container can easily brought up or down in an instance of a command. The stateful nature of the containers and applications is critical for the business. The same situation applies to container orchestration engines such as Kubernetes.

Container and Kubernetes Storage

Docker provides 3 methods to local storage. In the diagram below, it describes:

Continue reading →

Data Privacy First before AI Framework

By cfheoh | April 16, 2019 - 9:51 am |April 16, 2019 Algorithm, Analytics, API, Artificial Intelligence, Big Data, Data, Data Privacy, Data Protection, Data Security, Deep Learning, Industry 4.0, Machine Learning, Microsoft

Are we AI responsible or are we responsible for AI?

But I would like to highlight the data privacy part that is likely to figure strongly in the AI Framework, because the ethical use of AI is paramount. It will have economical, social and political impact on Malaysians, and everybody else too. I have written a few articles on LinkedIn about ethics, data privacy, data responsibility, impact of AI. You can read about them in the links below:

Data Responsibility – September 2017
Can we save ourselves from AI? – February 2018
Ethical Use of Data – April 2018
Malaysia, when will you take Data Privacy seriously? – April 2019

I may sound like a skeptic of AI. I am not. I believe AI will benefit mankind, and bring far reaching developments to the society as a whole. But we have to careful and this is my MAIN concern when I voice about AI. I continue to question the human ethics and the human biases that go into the algorithms that define AI. This has always been the crux of my gripes, my concerns, my skepticism of everything we call AI. I am not against AI but I am against the human flaws that shape the algorithms of AI.

Everything is a Sheep (or a Giraffe)

A funny story was shared with me last year. It was about Microsoft Azure computer vision algorithm in recognizing visuals in photos. Apparently the algorithm of the Microsoft Azure’s neural network was fed with some overzealous data of sheep (or giraffes), and the AI system started to point out that every spot that it “saw” was either a sheep, or any vertical long ones was a giraffe.

In the photo below, there were a bunch of sheep on a tree. Check out the tags/comments in the red rectangle published by the AI neural network software below and see how both Microsoft Azure and NeutralTalk2 “saw” in the photo. You can read more about the funny story here.

This proves my point that if you feed the learning system and the AI behind it with biased and flawed information, the result can be funny (in this case here) or disastrous. Continue reading →

We got to keep more data

By cfheoh | April 4, 2019 - 4:47 pm |April 4, 2019 Analytics, Artificial Intelligence, Big Data, Data, eDiscovery, Machine Learning

1 Comment

Guess which airport has won the most awards in the annual Skytrax list? Guess which airport won 480 awards since its opening in 1981? Guess how this airport did it?

Data Analytics gives the competive edge.

Serving and servicing more than 65 million passengers and travellers in 2018, and growing, Changi Airport Singapore sets a very high level customer service. And it does it with the help of technology, something they call Smart (Service Management through Analytics and Resource Transformation) Airport. In an ultra competitive and cut-throat airline business, the deep integration of customer-centric services and the ultimate traveller’s experience are crucial to the survival and growth of airlines. And it has definitely helped Singapore Airlines to be the world’s best airlines in 2018, its 4th win.

To achieve that, Changi Airport relies on technology and lots of relevant data for deep insights on how to serve its customers better. The details are well described in this old news article.

Keep More Relevant Data for Greater Insights

When I mean more data, I do not mean every single piece of data. Data has to be relevant to be useful.

How do we get more insights? How can we teach systems to learn? How to we develop artificial intelligence systems? By having more relevant data feeding into data analytics systems, machine learning and such.

As such, a simple framework for building from the data ingestion, to data repositories to outcomes such as artificial intelligence, predictive and recommendations systems, automation and new data insights isn’t difficult to understand. The diagram below is a high level overview of what I work with most of the time. Continue reading →

Malaysia, when will you take data privacy seriously?

By cfheoh | April 4, 2019 - 3:15 pm |April 4, 2019 Backup, Data, Data Management, Data Privacy, Data Protection, Data Security

The full force of Western Digital

By cfheoh | March 21, 2019 - 11:39 am |March 21, 2019 Acquisition, Analytics, API, Appliance, Artificial Intelligence, Backup, Big Data, Business Continuity, Cloud, Clusters, Composable Infrastructure, Data, Data Archiving, Data Availability, Data Management, Data Protection, Deep Learning, Disaster Recovery, Disks, Drivescale, Edge Computing, Flash, Fog Computing, Hyperconvergence, IoT, Kaminario, Machine Learning, NAS, Object Storage, Reliability, SCSI, Seagate, Solid State Devices, Storage Field Day, Storage Tiering, Tech Field Day, Tegile, Unified Storage, Western Digital

2 Comments

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

3 weeks after Storage Field Day 18, I was still trying to wrap my head around the 3-hour session we had with Western Digital. I was like a kid in a candy store for a while, because there were too much to chew and I couldn’t munch them all.

From “Silicon to System”

Not many storage companies in the world can claim that mantra – “From Silicon to Systems“. Western Digital is probably one of 3 companies (the other 2 being Intel and nVidia) I know of at present, which develops vertical innovation and integration, end to end, from components, to platforms and to systems.

For a long time, we have always known Western Digital to be a hard disk company. It owns HGST, SanDisk, providing the drives, the Flash and the Compact Flash for both the consumer and the enterprise markets. However, in recent years, through 2 eyebrow raising acquisitions, Western Digital was moving itself up the infrastructure stack. In 2015, it acquired Amplidata. 2 years later, it acquired Tegile Systems. At that time, I was wondering why a hard disk manufacturer was buying storage technology companies that were not its usual bread and butter business.

Continue reading →

WekaIO controls their performance destiny

By cfheoh | March 17, 2019 - 5:33 pm |March 17, 2019 Amazon Web Services, Analytics, Appliance, Big Data, CIFS, Cloud, Deep Learning, Filesystems, Flash, High Performance Computing, Infiniband, Linux, Lustre, Machine Learning, Mellanox Technologies, NAS, NetApp, NFS, NVMe, Object Storage, PCIe, Performance Benchmark, Performance Caching, RDMA, Scale-out architecture, SMB, Software Defined Storage, Storage Field Day, Storage Optimization, Storage Tiering, Tech Field Day, Virtualization, WekaIO, Western Digital

3 Comments

I was first introduced to WekaIO back in Storage Field Day 15. I did not blog about them back then, but I have followed their progress quite attentively throughout 2018. 2 Storage Field Days and a year later, they were back for Storage Field Day 18 with a new CTO, Andy Watson, and several performance benchmark records.

Blowout year

2018 was a blowout year for WekaIO. They have experienced over 400% growth, placed #1 in the Virtual Institute IO-500 10-node performance challenge, and also became #1 in the SPEC SFS 2014 performance and latency benchmark. (Note: This record was broken by NetApp a few days later but at a higher cost per client)

The Virtual Institute for I/O IO-500 10-node performance challenge was particularly interesting, because it pitted WekaIO against Oak Ridge National Lab (ORNL) Summit supercomputer, and WekaIO won. Details of the challenge were listed in Blocks and Files and WekaIO Matrix Filesystem became the fastest parallel file system in the world to date.

Control, control and control

I studied WekaIO’s architecture prior to this Field Day. And I spent quite a bit of time digesting and understanding their data paths, I/O paths and control paths, in particular, the diagram below:

Starting from the top right corner of the diagram, applications on the Linux client (running Weka Client software) and it presents to the Linux client as a POSIX-compliant file system. Through the network, the Linux client interacts with the WekaIO kernel-based VFS (virtual file system) driver which coordinates the Front End (grey box in upper right corner) to the Linux client. Other client-based protocols such as NFS, SMB, S3 and HDFS are also supported. The Front End then interacts with the NIC (which can be 10/100G Ethernet, Infiniband, and NVMeoF) through SR-IOV (single root IO virtualization), bypassing the Linux kernel for maximum throughput. This is with WekaIO’s own networking stack in user space. Continue reading →

Bridges to the clouds and more – NetApp NDAS

By cfheoh | March 15, 2019 - 7:48 am |March 15, 2019 Amazon Web Services, Analytics, API, Artificial Intelligence, Backup, Big Data, Cloud, Cohesity, Data Archiving, Data Availability, Data Fabric, Data Management, Data Protection, Data Security, Deep Learning, Disaster Recovery, Hyperconvergence, ILM, Machine Learning, NetApp, Reliability, Snapshots, Storage Field Day, Storage Tiering, Tech Field Day

2 Comments

The NetApp Data Fabric Vision

The NetApp Data Fabric vision has always been clear to me. Maybe it was because of my 2 stints with them, and I got well soaked in their culture. 3 simple points define the vision.

The Data Fabric is THE data singularity. Data can be anywhere – on-premises, the clouds, and more.
Have bridges, paths and workflows management to the Data, to move the data to wherever the data may be.
Work with technology partners to build tools and data systems to elevate the value of the data

That is how I see it. I wrote about the Transcendence of the Data Fabric vision 3+ years ago, and I emphasized the importance of the Data Pipeline in another NetApp blog almost a year ago. The introduction of NetApp Data Availability Services (NDAS) in the recently concluded Storage Field Day 18 was no different as NetApp constructs data bridges and paths to the AWS Cloud.

NetApp Data Availability Services

The NDAS feature is only available with ONTAP 9.5. With less than 5 clicks, data from ONTAP primary systems can be backed up to the secondary ONTAP target (running the NDAS proxy and the Copy to Cloud API), and then to AWS S3 buckets in the cloud.

Continue reading →

StorPool – Block storage managed well

By cfheoh | March 13, 2019 - 8:34 am |March 13, 2019 100Gigabit Ethernet, API, Cloud, Clusters, Data Availability, Data Management, Disks, Filesystems, High Performance Computing, iSCSI, Linux, Lustre, Nexenta, NVMe, Openstack, Performance Benchmark, Performance Caching, Scale-out architecture, Server SAN, Software Defined Storage, Storage Field Day, Storage Optimization, Storpool, Tech Field Day, Virtualization, VMware

2 Comments

Storage technology is complex. Storage infrastructure and data management operations are not trivial, despite what the hyperscalers like Amazon Web Services and Microsoft Azure would like you to think. As the adoption of cloud infrastructure services grow, the small and medium businesses/enterprises (SMB/SME) are usually left to their own devices to manage the virtual storage infrastructure. Cloud Service Providers (CSPs) addressing the SMB/SME market are looking for easier, worry-free, software-defined storage to elevate their value to their customers.

Managed high performance block storage

Enter StorPool.

StorPool is a scale-out block storage technology, capable of delivering 1 million+ IOPS with sub-milliseconds response times. As described by fellow delegate, Ray Lucchesi in his recent blog, they were able to achieve these impressive performance numbers in their demo, without the high throughput RDMA network or the storage class memory of Intel Optane. Continue reading →

Clever Cohesity

By cfheoh | March 11, 2019 - 7:28 pm |March 11, 2019 Analytics, API, Appliance, Artificial Intelligence, Backup, Big Data, Business Continuity, Cloud, Clusters, Cohesity, Data, Data Archiving, Data Availability, Data Management, Data Protection, Data Security, Deep Learning, Disaster Recovery, Edge Computing, eDiscovery, Filesystems, Fog Computing, Hadoop, Hadoop Clusters, Hyperconvergence, Interica, Machine Learning, Object Storage, Scale-out architecture, Software Defined Storage, Storage Field Day, Tech Field Day, Veritas

3 Comments

This is clever. This is very smart.

The moment the Cohesity App Marketplace pitch was shared at the Storage Field Day 18 session, somewhere in my mind, enlightenment came to me.

The hyperconverged platform for secondary data, or is it?

When Cohesity came into the scene, they were branded the latest unicorn alongside Rubrik. Both were gunning for the top hyperconverged platform for secondary data. Crazy money was pouring into that segment – Cohesity got USD250 million in June 2018; Rubrik received USD261 million in Jan 2019 – making the market for hyperconverged platforms for secondary data red-hot. Continue reading →

Catch up (fast) – IBM Spectrum Protect Plus

By cfheoh | March 1, 2019 - 8:15 pm |March 1, 2019 Analytics, API, Backup, Big Data, Business Continuity, Cloud, Data Archiving, Data Availability, Data Management, Data Protection, Data Security, Deduplication, Disaster Recovery, eDiscovery, IBM, Object Storage, Reliability, Snapshots, Storage Field Day, Virtualization

5 Comments

The IBM Spectrum Protect Plus (SPP) team returned again for Storage Field Day 18, almost exactly 50 weeks when they introduced SPP to the Storage Field Day 15 delegates in 2018. My comments in my blog about IBM SPP were not flattering but the product was fairly new back then. I joined the other delegates to listen to IBM again this time around, and being open minded to listen and see their software upgrade.

Spectrum Protect Plus is NOT Spectrum Protect

First of all, it is important to call that IBM Spectrum Protect (SP)and IBM Spectrum Protect Plus (SPP) are 2 distinct products. The SP is the old Tivoli Storage Manager (TSM) while SPP is a more “modern” product, answering to virtualized environments and several public cloud service providers target platforms. To date, SP is version 8.1.x while SPP is introduced as version 10.1.4. There are “some” integration between SP and SPP, where SPP data can be “offloaded” to the SP platform for long term retention.

For one, I certainly am confused about IBM’s marketing and naming of both products, and I am sure many face the same predicament too. Continue reading →

Storage Gaga

Figuring out storage for Kubernetes and containers

Oops! I forgot about you!

Two different worlds

Container and Kubernetes Storage

Data Privacy First before AI Framework

Are we AI responsible or are we responsible for AI?

Everything is a Sheep (or a Giraffe)

We got to keep more data

Data Analytics gives the competive edge.

Keep More Relevant Data for Greater Insights

The full force of Western Digital

From “Silicon to System”

WekaIO controls their performance destiny

Blowout year

Control, control and control

StorPool – Block storage managed well

Managed high performance block storage

Clever Cohesity

The hyperconverged platform for secondary data, or is it?

Recent Posts

Sponsored Ads

Google Adsense

Recent Comments

Google Adsense

Oops! I forgot about you!

Two different worlds

Container and Kubernetes Storage

Share this:

Are we AI responsible or are we responsible for AI?

Everything is a Sheep (or a Giraffe)

Share this:

Data Analytics gives the competive edge.

Keep More Relevant Data for Greater Insights

Share this:

Share this:

From “Silicon to System”

Share this:

Blowout year

Control, control and control

Share this:

The NetApp Data Fabric Vision

NetApp Data Availability Services

Share this:

Managed high performance block storage

Share this:

The hyperconverged platform for secondary data, or is it?

Share this:

Spectrum Protect Plus is NOT Spectrum Protect

Share this:

Recent Posts

Sponsored Ads

Google Adsense

Recent Comments

Google Adsense