distributed storage Archives

Persistent Storage could stifle Google Anthos multi-cloud ambitions

By cfheoh | July 27, 2020 - 9:15 am |July 26, 2020 Amazon Web Services, Analytics, API, Big Data, Cloud, Clusters, Containers, Data, Data Management, DellEMC, Docker, Google, Google Anthos, HPE, Kubernetes, Microsoft, Microsoft Azure, NetApp, Object Storage, Portworx, Pure Storage, Robin.io

BigQuery Omni conversation starter

2 weeks ago, whilst the Google Cloud BigQuery Omni announcement was still under wraps, local Malaysian IT portal Enterprise IT News sent me the embargoed article to seek my views and opinions. I have to admit that I was ignorant about the deeper workings of BigQuery, and haven’t fully gone through the works of Google Anthos as well. So I researched them.

Having done some small works on Qubida (defunct) and Talend several years ago, I have grasped useful data analytics and data enablement concepts, and so BigQuery fitted into my understanding of BigQuery Omni quite well. That triggered my interests to write this blog and meshing the persistent storage conundrum (at least for me it is something to be untangled) to Kubernetes, to GKE (Google Kubernetes Engine), and thus Anthos as well.

For discussion sake, here is an overview of BigQuery Omni.

An overview of Google Cloud BigQuery Omni on multiple cloud providers

My comments and views are in this EITN article “Google Cloud’s BigQuery Omni for Multi-cloud Analytics”.

Continue reading →

Paradigm shift of Dev to Storage Ops

By cfheoh | March 2, 2020 - 5:47 am |March 2, 2020 Amazon Web Services, API, Artificial Intelligence, Ceph, Cloud, Composable Infrastructure, Containers, Data Management, Deep Learning, Docker, Drivescale, Edge Computing, Filesystems, Hadoop Clusters, High Performance Computing, IBM, Kubernetes, Linux, Liqid, Machine Learning, Minio, Object Storage, Performance Benchmark, Redhat, Scale-out architecture, Software Defined Storage, Storage Field Day, Tech Field Day, VMware

2 Comments

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at the event. The content of this blog is of my own opinions and views ]

A funny photo (below) came up on my Facebook feed a couple of weeks back. In an honest way, it depicted how a developer would think (or the lack of thinking) about the storage infrastructure designs and models for the applications and workloads. This also reminded me of how DBAs used to diss storage engineers. “I don’t care about storage, as long as it is RAID 10“. That was aeons ago 😉

The world of developers and the world of infrastructure people are vastly different. Since cloud computing birthed, both worlds have collided and programmable infrastructure-as-code (IAC) have become part and parcel of cloud native applications. Of course, there is no denying that there is friction.

Welcome to DevOps!

The Kubernetes factor

Containerized applications are quickly defining the cloud native applications landscape. The container orchestration machinery has one dominant engine – Kubernetes.

In the world of software development and delivery, DevOps has taken a liking to containers. Containers make it easier to host and manage life-cycle of web applications inside the portable environment. It packages up application code other dependencies into building blocks to deliver consistency, efficiency, and productivity. To scale to a multi-applications, multi-cloud with th0usands and even tens of thousands of microservices in containers, the Kubernetes factor comes into play. Kubernetes handles tasks like auto-scaling, rolling deployment, computer resource, volume storage and much, much more, and it is designed to run on bare metal, in the data center, public cloud or even a hybrid cloud.

Continue reading →

Figuring out storage for Kubernetes and containers

By cfheoh | April 16, 2019 - 6:44 pm |April 16, 2019 Amazon Web Services, API, Cloud, Containers, Data, Data Archiving, Data Availability, Data Management, Data Protection, Data Security, Docker, Edge Computing, Elastifile, Fibre Channel, Filesystems, Google, Hyperconvergence, Kubernetes, NFS, Openstack, Performance Benchmark, Performance Caching, Robin.io, Snapshots, Software Defined Storage, Storage Optimization, Virtualization, VMware

2 Comments

Oops! I forgot about you!

To me, containers and container orchestration (CO) engines such as Kubernetes, Mesos, Docker Swarm are fantastic. They scale effortlessly and are truly designed for cloud native applications (CNA).

But one thing irks me. Storage management for containers and COs. It was as if when they designed and constructed containers and the containers orchestration (CO) engines, they forgot about the considerations of storage and storage management. At least the persistent part of storage.

Over a year ago, I was in two minds about persistent storage, especially when it comes to the transient nature of microservices which was so prevalent and were inundating the cloud native applications landscape. I was searching for answers in my blog. The decentralization of microservices in containers means mass deployment at the edge, but to have the pre-processed and post-processed data stick to the persistent storage at the edge device is a challenge. The operative word here is “STICK”.

Two different worlds

Containers were initially designed and built for lightweight applications such as microservices. The runtime, libraries, configuration files and dependencies are all in one package. They were meant to do simple tasks quickly and scales to thousands easily. They could be brought up and brought down in little time and did not have to bother about the persistent data stored by the host. The state of the containers were also not important to the application tasks at hand.

Today containers like Docker have matured to run enterprise applications and the state of the container is important. The applications must know the state and the health of the container. The container could be in online mode, online but not accepting data mode, suspended mode, paused mode, interrupted mode, quiesced mode or halted mode. Each mode or state of the container is important to the running applications and the container can easily brought up or down in an instance of a command. The stateful nature of the containers and applications is critical for the business. The same situation applies to container orchestration engines such as Kubernetes.

Container and Kubernetes Storage

Docker provides 3 methods to local storage. In the diagram below, it describes:

Continue reading →

WekaIO controls their performance destiny

By cfheoh | March 17, 2019 - 5:33 pm |March 17, 2019 Amazon Web Services, Analytics, Appliance, Big Data, CIFS, Cloud, Deep Learning, Filesystems, Flash, High Performance Computing, Infiniband, Linux, Lustre, Machine Learning, Mellanox Technologies, NAS, NetApp, NFS, NVMe, Object Storage, PCIe, Performance Benchmark, Performance Caching, RDMA, Scale-out architecture, SMB, Software Defined Storage, Storage Field Day, Storage Optimization, Storage Tiering, Tech Field Day, Virtualization, WekaIO, Western Digital

3 Comments

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

I was first introduced to WekaIO back in Storage Field Day 15. I did not blog about them back then, but I have followed their progress quite attentively throughout 2018. 2 Storage Field Days and a year later, they were back for Storage Field Day 18 with a new CTO, Andy Watson, and several performance benchmark records.

Blowout year

2018 was a blowout year for WekaIO. They have experienced over 400% growth, placed #1 in the Virtual Institute IO-500 10-node performance challenge, and also became #1 in the SPEC SFS 2014 performance and latency benchmark. (Note: This record was broken by NetApp a few days later but at a higher cost per client)

The Virtual Institute for I/O IO-500 10-node performance challenge was particularly interesting, because it pitted WekaIO against Oak Ridge National Lab (ORNL) Summit supercomputer, and WekaIO won. Details of the challenge were listed in Blocks and Files and WekaIO Matrix Filesystem became the fastest parallel file system in the world to date.

Control, control and control

I studied WekaIO’s architecture prior to this Field Day. And I spent quite a bit of time digesting and understanding their data paths, I/O paths and control paths, in particular, the diagram below:

Starting from the top right corner of the diagram, applications on the Linux client (running Weka Client software) and it presents to the Linux client as a POSIX-compliant file system. Through the network, the Linux client interacts with the WekaIO kernel-based VFS (virtual file system) driver which coordinates the Front End (grey box in upper right corner) to the Linux client. Other client-based protocols such as NFS, SMB, S3 and HDFS are also supported. The Front End then interacts with the NIC (which can be 10/100G Ethernet, Infiniband, and NVMeoF) through SR-IOV (single root IO virtualization), bypassing the Linux kernel for maximum throughput. This is with WekaIO’s own networking stack in user space. Continue reading →

StorPool – Block storage managed well

By cfheoh | March 13, 2019 - 8:34 am |March 13, 2019 100Gigabit Ethernet, API, Cloud, Clusters, Data Availability, Data Management, Disks, Filesystems, High Performance Computing, iSCSI, Linux, Lustre, Nexenta, NVMe, Openstack, Performance Benchmark, Performance Caching, Scale-out architecture, Server SAN, Software Defined Storage, Storage Field Day, Storage Optimization, Storpool, Tech Field Day, Virtualization, VMware

2 Comments

Storage technology is complex. Storage infrastructure and data management operations are not trivial, despite what the hyperscalers like Amazon Web Services and Microsoft Azure would like you to think. As the adoption of cloud infrastructure services grow, the small and medium businesses/enterprises (SMB/SME) are usually left to their own devices to manage the virtual storage infrastructure. Cloud Service Providers (CSPs) addressing the SMB/SME market are looking for easier, worry-free, software-defined storage to elevate their value to their customers.

Managed high performance block storage

Enter StorPool.

StorPool is a scale-out block storage technology, capable of delivering 1 million+ IOPS with sub-milliseconds response times. As described by fellow delegate, Ray Lucchesi in his recent blog, they were able to achieve these impressive performance numbers in their demo, without the high throughput RDMA network or the storage class memory of Intel Optane. Continue reading →

Tag Archives: distributed storage

Persistent Storage could stifle Google Anthos multi-cloud ambitions

BigQuery Omni conversation starter

Continue reading →

Figuring out storage for Kubernetes and containers

Oops! I forgot about you!

Two different worlds

Container and Kubernetes Storage

WekaIO controls their performance destiny

Blowout year

Control, control and control

StorPool – Block storage managed well

Managed high performance block storage

Recent Posts

Sponsored Ads

Google Adsense

Recent Comments

Google Adsense

BigQuery Omni conversation starter

Continue reading →

Share this:

The Kubernetes factor

Share this:

Oops! I forgot about you!

Two different worlds

Container and Kubernetes Storage

Share this:

Blowout year

Control, control and control

Share this:

Managed high performance block storage

Share this:

Recent Posts

Sponsored Ads

Google Adsense

Recent Comments

Google Adsense