MapReduce – Storage Gaga

The All-Important Storage Appliance Mindset for HPC and AI projects

By cfheoh | July 22, 2024 - 7:30 am |November 17, 2024 API, Appliance, Artificial Intelligence, BeeGFS, Big Data, Cloud, Clusters, Data Direct Networks, DDN, Deep Learning, Digital Transformation, Elastifile, Filesystems, Flash, High Performance Computing, Infiniband, iXsystems, Lustre, Machine Learning, MapReduce, NAS, NetApp, NFS, nVidia, NVMe, Object Storage, Parallel NFS, Performance Benchmark, Performance Caching, RDMA, Scale-out architecture, Software Defined Storage, Solid State Devices, Storage Optimization, ThinkParq, WekaIO

Costs Benefits and Risks

I like to think about what the end users are thinking about. There are investments costs involved, and along with it, risks to the investments as well as their benefits. Let’s just simplify and lump them into Cost-Benefits-Risk analysis triangle. These variables come into play in the decision making of AI and HPC projects.

Continue reading →

Rethinking data processing frameworks systems in real time

By cfheoh | December 27, 2021 - 8:00 am |December 26, 2021 Algorithm, Amazon Web Services, Analytics, API, Artificial Intelligence, Confluent, Containers, Data, Data Management, Data Privacy, Data Protection, Data Security, Digital Transformation, Google, Hadoop, Hadoop Clusters, InfluxDB, Machine Learning, MapReduce, Microsoft Azure, Pravega, Scale-out architecture

What the heck is Storage Modernization?

By cfheoh | September 20, 2021 - 7:00 am |September 19, 2021 Acquisition, Analytics, API, Artificial Intelligence, Big Data, Business Continuity, Cloud, Containers, Data, Data Archiving, Data Availability, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Digital Transformation, Disaster Recovery, Edge Computing, Green Computing, Hadoop, Hadoop Clusters, Kubernetes, Machine Learning, MapReduce, Reliability, Software-defined Datacenter, Solid State Devices, Storage Optimization, Tape storage

Big Data is right

When the word “Big Data” came into prominence a while back, it stirred the IT industry into a frenzy. At one point, Apache Hadoop became the poster elephant (pun intended) for this exciting new segment. So many Vs came out, but I settled with 4 Vs as the framework of my IT conversations. The 4Vs we often hear are:

Volume
Velocity
Variety
Veracity

Continue reading →

Hadoop is truly dead – LOTR version

By cfheoh | January 24, 2020 - 1:06 pm |January 24, 2020 Acquisition, Analytics, API, Artificial Intelligence, Big Data, Cloud, Cloudera, Containers, Data Management, Data Security, Deep Learning, Digital Transformation, Hadoop, Hadoop Clusters, Kubernetes, MapReduce, NAS, NetApp, Object Storage, Pure Storage, Storage Field Day, Tech Field Day

2 Comments

[Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies to be presented at this event. The content of this blog is of my own opinions and views]

This blog was not intended because it was not in my plans to write it. But a string of events happened in the Storage Field Day 19 week and I have the fodder to share my thoughts. Hadoop is indeed dead.

Warning: There are Lord of the Rings references in this blog. You might want to do some research. 😉

Storage metrics never happened

The fellowship of Arjan Timmerman, Keiran Shelden, Brian Gold (Pure Storage) and myself started at the office of Pure Storage in downtown Mountain View, much like Frodo Baggins, Samwise Gamgee, Peregrine Took and Meriadoc Brandybuck forging their journey vows at Rivendell. The podcast was supposed to be on the topic of storage metrics but was unanimously swung to talk about Hadoop under the stewardship of Mr. Stephen Foskett, our host of Tech Field Day. I saw Stephen as Elrond Half-elven, the Lord of Rivendell, moderating the podcast as he would have in the plans of decimating the One Ring in Mount Doom.

So there we were talking about Hadoop, or maybe Sauron, or both.

The photo of the Oliphaunt below seemed apt to describe the industry attacks on Hadoop.

Continue reading →

Time to Advocate Common Data Personality

By cfheoh | November 25, 2019 - 12:33 pm |November 25, 2019 Amazon Web Services, Analytics, API, Artificial Intelligence, Big Data, Commvault, Containers, Data Archiving, Data Availability, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Digital Transformation, eDiscovery, Hadoop, Hadoop Clusters, HDS, Hedvig, Hitachi Vantara, InfluxDB, Machine Learning, MapReduce, Object Storage

3 Comments

The thought of it has been on my mind since Commvault GO 2019. It was sparked when Don Foster, VP of Storage Solutions of Commvault answered a question posted by one of the analysts. What he said made a connection, as I was searching for the better insights to how Commvault and Hedvig would end up to be together.

Data Deluge is a swamp thing now

Several years ago, I heard Stephen Brobst, CTO of Teradata brought up the term “Data Swamp“. It was the anti- part of the Data Lakes, and this was back when Data Lakes and Hadoop were all the rage. His comments were raw, honest and it was leading to the truth out there.

Source: https://www.deviantart.com/rhineville/art/God-that-Crawls-Detail-2-291228644

I was enamoured by his thoughts at the time, and today, his comments about the Data Swamp manifested itself. Continue reading →

Commvault big bet

By cfheoh | September 12, 2019 - 9:03 pm |September 12, 2019 Acquisition, Analytics, API, Appliance, Big Data, Business Continuity, Cisco, Cloud, Cohesity, Commvault, Data Archiving, Data Availability, Data Corruption, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Digital Transformation, Filesystems, Hadoop, Hadoop Clusters, Hedvig, Hitachi Vantara, Hyperconvergence, ILM, Infrascale, Machine Learning, MapReduce, Minio, NAS, NetApp, Object Storage, Scale-out architecture, Software Defined Storage, Software-defined Datacenter, Storage Field Day, Storage Tiering, Tape storage, Tech Field Day, Unified Storage, Veeam, Veritas, Zerto

1 Comment

I woke up at 2.59am in the morning of Sept 5th morning, a bit discombobulated and quickly jumped into the Commvault call. The damn alarm rang and I slept through it, but I got up just in time for the 3am call.

As I was going through the motion of getting onto UberConference, organized by GestaltIT, I was already sensing something big. In the call, Commvault was acquiring Hedvig and it hit me. My drowsy self centered to the big news. And I saw a few guys from Veritas and Cohesity on my social media group making gestures about the acquisition.

I spent the rest of the week thinking about the acquisition. What is good? What is bad? How is Commvault going to move forward? This is at pressing against the stark background from the rumour mill here in South Asia, just a week before this acquisition news, where I heard that the entire Commvault teams in Malaysia and Asia Pacific were released. I couldn’t confirm the news in Asia Pacific, but the source of the news coming from Malaysia was strong and a reliable one.

What is good?

It is a big win for Hedvig. Nestled among several scale-out primary storage vendors and little competitive differentiation, this Commvault acquisition is Hedvig’s pay day.

Continue reading →

Oracle Cloud Infrastructure to prove skeptics wrong

By cfheoh | October 24, 2018 - 6:53 am |October 24, 2018 Amazon, Analytics, Artificial Intelligence, Big Data, Cloud, Clusters, Data Availability, Data Management, Deep Learning, Disaster Recovery, Flash, High Performance Computing, Machine Learning, MapReduce, Object Storage, Oracle, Oracle Cloud, Performance Benchmark, Reliability, Scale-out architecture, Software-defined Datacenter, Storage Field Day, Tech Field Day, Virtualization

1 Comment

[Preamble: I have been invited by GestaltIT as a delegate to their TechFieldDay from Oct 17-19, 2018 in the Silicon Valley USA. My expenses, travel and accommodation are covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

The much maligned Oracle Cloud is getting a fresh reboot, starting with their Oracle Cloud Infrastructure (OCI), and significant enhancements and technology updates were announced at the Oracle Open World this week. I had the privilege to hear about Oracle Cloud’s new attack plan when they presented at Tech Field Day 17 last week.

Oracle Cloud has not have the best of days in recent months. Thomas Kurian’s resignation as their President of Product Development was highly publicized in a disagreement with CTO and founder, Larry Ellison over cloud software strategy. Then there was an on-going lawsuit about how Oracle was misrepresenting their cloud revenue growth, which puts Oracle in a bad light.

On the local front here in Malaysia, I have heard from the grapevine of the aggressive nature of Oracle personnel pushing partners and customers to adopt their cloud services using legal scare tactics on their database licensing. A buddy of mine, who was previously the cloud business development manager at CTC Global, also shared Oracle’s cloud shortcomings compared to Amazon Web Service and Microsoft Azure a year ago.

Oracle Cloud Infrastructure team aimed to turnover the bad perceptions, starting with the delegates of Tech Field Day 17, including yours truly.Their strategy was clear. Oracle Cloud Infrastructure runs the highest performance and the highest enterprise grade Infrastructure-as-a-Service (IaaS), bar none. Unlike the IBM Cloud, which in my opinion is a wishy-washy cloud service platform, Oracle Cloud’s ambition is solid.

They did a demo on JDEdwards EnterpriseOne application, and they continue to demonstrate their prowess running the highest performance computing experience ever, for all enterprise-grade workload. And that enterprise pedigree is clear.

Just this week, Amazon Prime Day had an outage. Amazon is in the process of weaning Oracle database from their entire ecosystem by 2020, and this outage clearly showed that the Oracle database and the enterprise applications would only run best on Oracle Cloud Infrastructure.

Continue reading →

The Network is Still the Computer

By cfheoh | October 22, 2018 - 11:35 am |October 22, 2018 100Gigabit Ethernet, 10Gigabit Ethernet, Analytics, API, Artificial Intelligence, Big Data, Cisco, Clusters, Data Management, Deep Learning, Disks, Drivescale, Fibre Channel, Filesystems, Hadoop, Hadoop Clusters, High Performance Computing, Infiniband, iSCSI, Linux, Machine Learning, MapReduce, NFS, NVMe, PCIe, Performance Benchmark, RDMA, Scale-out architecture, Storage Field Day, Tech Field Day, Virtualization

3 Comments

Sun Microsystems coined the phrase “The Network is the Computer“. It became one of the most powerful ideologies in the computing world, but over the years, many technology companies have tried to emulate and practise the mantra, but fell short.

I have never heard of Drivescale. It wasn’t in my radar until the legendary NFS guru, Brian Pawlowski joined them in April this year. Beepy, as he is known, was CTO of NetApp and later at Pure Storage, and held many technology leadership roles, including leading the development of NFSv3 and v4.

Prior to Tech Field Day 17, I was given some “homework”. Stephen Foskett, Chief Cat Herder (as he is known) of Tech Field Days and Storage Field Days, highly recommended Drivescale and asked the delegates to pick up some notes on their technology. Going through a couple of the videos, Drivescale’s message and philosophy resonated well with me. Perhaps it was their Sun Microsystems DNA? Many of the Drivescale team members were from Sun, and I was previously from Sun as well. I was drinking Sun’s Kool Aid by the bucket loads even before I graduated in 1991, and so what Drivescale preached made a lot of sense to me.Drivescale is all about Scale-Out Architecture at the webscale level, to address the massive scale of data processing. To understand deeper, we must think about “Data Locality” and “Data Mobility“. I frequently use these 2 “points of discussion” in my consulting practice in architecting and designing data center infrastructure. The gist of data locality is simple – the closer the data is to the processing, the cheaper/lightweight/efficient it gets. Moving data – the data mobility part – is expensive.

Continue reading →

Hammering Next Gen Hybrid Clouds

By cfheoh | October 18, 2018 - 8:51 pm |October 19, 2018 Acquisition, Analytics, Appliance, Artificial Intelligence, CIFS, Cloud, Data, Data Fabric, Data Management, Deduplication, Disaster Recovery, Filesystems, Hammerspace, High Performance Computing, Hyperconvergence, Machine Learning, MapReduce, NAS, NetApp, NFS, Object Storage, Performance Caching, Reliability, Software-defined Datacenter, Storage Field Day, Storage Tiering, Tech Field Day, Virtualization

2 Comments

[Preamble: I have been invited by GestaltIT as a delegate to their TechFieldDay from Oct 17-19, 2018 in the Silicon Valley USA. My expenses, travel and accommodation are paid by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

Hammerspace came out of stealth 2 days ago. Their objective? To rule the world of data for hybrid clouds and multi-clouds, and provide “unstructured data anywhere on-demand“. That is a bold statement, for a company that is relatively unknown, except for its deep ties with the now defunct Primary Data. Primary Data’s Chairman, David Flynn, is the head honcho at Hammerspace.

The Hammerspace technology has come the right time in my opinion because the entire cloud, multi-cloud and hybrid cloud stories have become fractured, siloed. The very thing that cloud computing touted to fix has brought back the same set of problems. At the same time, not every application was developed for the cloud. Applications rely on block storage services, or NAS protocols, or the de facto S3 protocols for storage repositories. However, the integration and communication between applications break down when these on-premises applications are moving to the cloud, or when applications residing the cloud are moved back to on-premises for throughput delivery, or even applications residing at the edge.

Continue reading →

Cohesity SpanFS – a foundational shift

By cfheoh | March 11, 2018 - 1:26 am |March 11, 2018 Analytics, API, Appliance, Big Data, Business Continuity, Cloud, Cohesity, Data, Data Archiving, Data Availability, Data Management, Deduplication, Disaster Recovery, Filesystems, High Performance Computing, Hyperconvergence, MapReduce, Nutanix, Performance Benchmark, Performance Caching, Reliability, ROBO, Scale-out architecture, Snapshots, Software Defined Storage, Software-defined Datacenter, Storage Field Day, Storage Optimization, Storage Tiering, Uncategorized, Virtualization

3 Comments

[Preamble: I was a delegate of Storage Field Day 15 from Mar 7-9, 2018. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]

Cohesity SpanFS impressed me. Their filesystem was designed from ground up to meet the demands of the voluminous cloud-scale data, and yes, the sheer magnitude of data everywhere needs to be managed.

We all know that primary data is always the more important piece of data landscape but there is a growing need to address the secondary data segment as well.

Like a floating iceberg, the piece that is sticking out is the more important primary data but the larger piece beneath the surface of the water, which is the secondary data, is becoming more valuable. Applications such as file shares, archiving, backup, test and development, and analytics and insights are maturing as the foundational data management frameworks and fast becoming the bedrock of businesses.

The ability of businesses to bounce back after a disaster; the relentless testing of large data sets to develop new competitive advantage for businesses; the affirmations and the insights of analyzing data to reduce risks in decision making; all these are the powerful back engine applicability that thrust businesses forward. Even the ability to search for the right information in a sea of data for regulatory and compliance reasons is part of the organization’s data management application.

Continue reading →

Category Archives: MapReduce

The All-Important Storage Appliance Mindset for HPC and AI projects

Costs Benefits and Risks

Rethinking data processing frameworks systems in real time