Hadoop Clusters – Page 2

Clever Cohesity

By cfheoh | March 11, 2019 - 7:28 pm |March 11, 2019 Analytics, API, Appliance, Artificial Intelligence, Backup, Big Data, Business Continuity, Cloud, Clusters, Cohesity, Data, Data Archiving, Data Availability, Data Management, Data Protection, Data Security, Deep Learning, Disaster Recovery, Edge Computing, eDiscovery, Filesystems, Fog Computing, Hadoop, Hadoop Clusters, Hyperconvergence, Interica, Machine Learning, Object Storage, Scale-out architecture, Software Defined Storage, Storage Field Day, Tech Field Day, Veritas

3 Comments

[Preamble: I have been invited by GestaltIT as a delegate to their Tech Field Day for Storage Field Day 18 from Feb 27-Mar 1, 2019 in the Silicon Valley USA. My expenses, travel and accommodation were covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

This is clever. This is very smart.

The moment the Cohesity App Marketplace pitch was shared at the Storage Field Day 18 session, somewhere in my mind, enlightenment came to me.

The hyperconverged platform for secondary data, or is it?

When Cohesity came into the scene, they were branded the latest unicorn alongside Rubrik. Both were gunning for the top hyperconverged platform for secondary data. Crazy money was pouring into that segment – Cohesity got USD250 million in June 2018; Rubrik received USD261 million in Jan 2019 – making the market for hyperconverged platforms for secondary data red-hot. Continue reading →

Sleepless in Malaysia with Object Storage

By cfheoh | January 22, 2019 - 1:42 pm |January 22, 2019 Amazon, Amazon Web Services, Analytics, API, Big Data, Cloud, Cloudian, Clusters, Data Management, Deep Learning, DellEMC, Dropbox, Filesystems, Google, Hadoop, Hadoop Clusters, HDS, IDC, IoT, iSCSI, Linux, Machine Learning, Microsoft, Minio, NAS, NFS, Object Storage, OpenIO, Redhat, Security, swiftstack

Object Storage? What’s that?

For the past couple of months, I have been speaking with a few parties in Malaysia about object storage technology. And I was fairly surprised with the responses.

The 2 reports

For a start, I did not set out to talk about object storage. It kind of fell onto my lap. 2 recent Hitachi Vantara reports revealed that countries like Australia, Hong Kong and even South East Asian countries were behind in their understanding of what object storage was, and the benefits it brought to the new generation of web scale and enterprise applications.

In the first report, an IDC survey sponsored by Hitachi Vantara, mentioned that 41% of the enterprises in Australia are not aware of object storage technology. In a similar survey, this one pointing towards Hong Kong and China, the percentages were 38% and 35% respectively. I would presume that the percentages for countries in South East Asia would not fall too far from the apple tree.

How is Malaysia doing?

However, I worry that the percentage number could be far more dire in Malaysia. In the past 2 months, responses from several conversations painted a darker hue about object storage technology with the companies in Malaysia. These included a reasonable sized hosting company, a well-established systems integrator, a software development company, several storage practitioners in Openstack and a DellEMC’s regional consultant for unstructured data. The collective conclusion was object storage technology was relatively unknown (probably similar to the percentages to the IDC/Hitachi Vantara reports), but it appeared to be shunned at this juncture. In web scale applications, Redhat Ceph block and files appeared popular in contrast to Openstack Swift. In enterprise applications, it was a toss of iSCSI and NFS.

Image from https://zdnet4.cbsistatic.com/hub/i/r/2018/04/24/c79e9dfb-b4a9-46bb-b831-f2c57fdf8a1d/resize/470xauto/5e4846d1bc7a034c382baf6dcbb612ed/cloud-storage.jpg

Continue reading →

Sexy HPC storage is all the rage

By cfheoh | November 26, 2018 - 10:44 am |November 26, 2018 100Gigabit Ethernet, Analytics, API, Artificial Intelligence, BeeGFS, CIFS, Clusters, Data Management, Deep Learning, DellEMC, Disks, E8 Storage, EMC, Excelero, Filesystems, Hadoop Clusters, High Performance Computing, Hyperconvergence, IBM, Infiniband, Intel, Linux, Lustre, Machine Learning, Mellanox, Memory Cloud, NAS, NetApp, NFS, Panasas, Performance Benchmark, Performance Caching, Pure Storage, RDMA, Scale-out architecture, SMB, Software-defined Datacenter, Storage Field Day, Tech Field Day, ThinkParq, WekaIO

HPC is sexy

There is no denying it. HPC is sexy. HPC Storage is just as sexy.

Looking at the latest buzz from Super Computing Conference 2018 which happened in Dallas 2 weeks ago, the number of storage related vendors participating was staggering. Panasas, Weka.io, Excelero, BeeGFS, are the ones that I know because I got friends posting their highlights. Then there are the perennial vendors like IBM, Dell, HPE, NetApp, Huawei, Supermicro, and so many more. A quick check on the SC18 website showed that there were 391 exhibitors on the floor.

And this is driven by the unrelentless demand for higher and higher performance of computing, and along with it, the demands for faster and faster storage performance. Commercialization of Artificial Intelligence (AI), Deep Learning (DL) and newer applications and workloads together with the traditional HPC workloads are driving these ever increasing requirements. However, most enterprise storage platforms were not designed to meet the demands of these new generation of applications and workloads, as many have been led to believe. Why so?

I had a couple of conversations with a few well known vendors around the topic of HPC Storage. And several responses thrown back were to put Flash and NVMe to solve the high demands of HPC storage performance. In my mind, these responses were too trivial, too irresponsible. So I wanted to write this blog to share my views on HPC storage, and not just about its performance.

The HPC lines are blurring

I picked up this video (below) a few days ago. It was insideHPC Rich Brueckner interview with Dr. Goh Eng Lim, HPE CTO and renowned HPC expert about the convergence of both traditional and commercial HPC applications and workloads.

I liked the conversation in the video because it addressed the 2 different approaches. And I welcomed Dr. Goh’s invitation to the Commercial HPC community to work with the Traditional HPC vendors to help push the envelope towards Exascale SuperComputing.

Continue reading →

The Network is Still the Computer

By cfheoh | October 22, 2018 - 11:35 am |October 22, 2018 100Gigabit Ethernet, 10Gigabit Ethernet, Analytics, API, Artificial Intelligence, Big Data, Cisco, Clusters, Data Management, Deep Learning, Disks, Drivescale, Fibre Channel, Filesystems, Hadoop, Hadoop Clusters, High Performance Computing, Infiniband, iSCSI, Linux, Machine Learning, MapReduce, NFS, NVMe, PCIe, Performance Benchmark, RDMA, Scale-out architecture, Storage Field Day, Tech Field Day, Virtualization

3 Comments

[Preamble: I have been invited by GestaltIT as a delegate to their TechFieldDay from Oct 17-19, 2018 in the Silicon Valley USA. My expenses, travel and accommodation are covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

Sun Microsystems coined the phrase “The Network is the Computer“. It became one of the most powerful ideologies in the computing world, but over the years, many technology companies have tried to emulate and practise the mantra, but fell short.

I have never heard of Drivescale. It wasn’t in my radar until the legendary NFS guru, Brian Pawlowski joined them in April this year. Beepy, as he is known, was CTO of NetApp and later at Pure Storage, and held many technology leadership roles, including leading the development of NFSv3 and v4.

Prior to Tech Field Day 17, I was given some “homework”. Stephen Foskett, Chief Cat Herder (as he is known) of Tech Field Days and Storage Field Days, highly recommended Drivescale and asked the delegates to pick up some notes on their technology. Going through a couple of the videos, Drivescale’s message and philosophy resonated well with me. Perhaps it was their Sun Microsystems DNA? Many of the Drivescale team members were from Sun, and I was previously from Sun as well. I was drinking Sun’s Kool Aid by the bucket loads even before I graduated in 1991, and so what Drivescale preached made a lot of sense to me.Drivescale is all about Scale-Out Architecture at the webscale level, to address the massive scale of data processing. To understand deeper, we must think about “Data Locality” and “Data Mobility“. I frequently use these 2 “points of discussion” in my consulting practice in architecting and designing data center infrastructure. The gist of data locality is simple – the closer the data is to the processing, the cheaper/lightweight/efficient it gets. Moving data – the data mobility part – is expensive.

Continue reading →

Own the Data Pipeline

By cfheoh | March 27, 2018 - 9:50 am |March 27, 2018 Analytics, API, Backup, Big Data, Cloud, Data, Data Archiving, Data Availability, Data Fabric, Data Management, Disaster Recovery, Filesystems, FreeNAS, Hadoop, Hadoop Clusters, HDS, High Performance Computing, Hitachi Vantara, Hyperconvergence, Machine Learning, NAS, NetApp, NFS, Reliability, ROBO, Software Defined Storage, Software-defined Datacenter, Storage Field Day, Storage Tiering, Virtualization

2 Comments

[Preamble: I was a delegate of Storage Field Day 15 from Mar 7-9, 2018. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]

I am a big proponent of Go-to-Market (GTM) solutions. Technology does not stand alone. It must be in an ecosystem, and in each industry, in each segment of each respective industry, every ecosystem is unique. And when we amalgamate data, the storage infrastructure technologies and the data management into the ecosystem, we reap the benefits in that ecosystem.

Data moves in the ecosystem, from system to system, north to south, east to west and vice versa, random, sequential, ad-hoc. Data acquires different statuses, different roles, different relevances in its lifecycle through the ecosystem. From it, we derive the flow, a workflow of data creating a data pipeline. The Data Pipeline concept has been around since the inception of data.

To illustrate my point, I created one for the Oil & Gas – Exploration & Production (EP) upstream some years ago.

Continue reading →

Considerations of Hadoop in the Enterprise

By cfheoh | September 9, 2016 - 10:10 pm |September 10, 2016 10Gigabit Ethernet, Data Management, Deduplication, Filesystems, Flash, Hadoop, Hadoop Clusters, High Performance Computing, MapReduce, NetApp, Performance Caching, RAID, Reliability, Server SAN, Software Defined Storage, Solid State Devices, Storage Optimization, Storage Tiering, Virtualization

1 Comment

I am guilty. I have not been tendering this blog for quite a while now, but it feels good to be back. What have I been doing? Since leaving NetApp 2 months or so ago, I have been active in the scenes again. This time I am more aligned towards data analytics and its burgeoning impact on the storage networking segment.

I was intrigued by an article posted by a friend of mine in Facebook. The article (circa 2013) was titled “Never, ever do this to Hadoop”. It described the author’s gripe with the SAN bigots. I have encountered storage professionals who throw in the SAN solution every time, because that was all they know. NAS, to them, was like that old relative smelled of camphor oil and they avoid NAS like a plague. Similar DAS was frowned upon but how things have changed. The pendulum has swung back to DAS and new market segments such as VSANs and Hyper Converged platforms have been dominating the scene in the past 2 years. I highlighted this in my blog, “Praying to the Hypervisor God” almost 2 years ago.

I agree with the author, Andrew C. Oliver. The “locality” of resources is central to Hadoop’s performance.

Consider these 2 models:

In the model on your left (Moving Data to Compute), the delivery process from Storage to Compute is HEAVY. That is because data has dependencies; data has gravity. However, if you consider the model on your right (Moving Compute to Data), delivering data processing to the storage layer is much lighter. Compute or data processing is transient, and the data in the compute layer is volatile. Once compute’s power is turned off, everything starts again from a clean slate, hence the volatile stage.

Continue reading →

Category Archives: Hadoop Clusters

Clever Cohesity

The hyperconverged platform for secondary data, or is it?