Big Data – Page 4 – Storage Gaga

Falconstor Software Defined Data Preservation for the Next Generation

By cfheoh | April 27, 2020 - 9:56 am |April 27, 2020 Amazon Web Services, Analytics, API, Appliance, Artificial Intelligence, Backup, Big Data, Business Continuity, Cloud, Clusters, Composable Infrastructure, compression, Containers, Data, Data Archiving, Data Availability, Data Corruption, Data Domain, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, deduplication, Deduplication, Digital Transformation, Disaster Recovery, Disks, eDiscovery, Falconstor, HDS, Linux, LTFS, LTO, LTO-8, Microsoft, Microsoft Azure, NetApp, RAID, Scale-out architecture, Software Defined Storage, Software-defined Datacenter, Starwind, Storage Tiering, Tape storage, virtual tape library, Virtualization, VTL

Reap at low tide

By cfheoh | April 13, 2020 - 11:12 am |April 13, 2020 Acquisition, Artificial Intelligence, Big Data, Business Continuity, BYOD, Cloud, Data Management, Data Security, Digital Transformation, Industry 4.0, Machine Learning, Reliability

Why are they not spending?

This news appeared in my LinkedIn feed. It read “Malaysian Banks Don’t Spend Enough on Tech“. It irked me immensely because in a soft economy climate (the low tide), our Malaysian financial institutions should be spending more on technology (reaping the opportunity) to get ahead.

Why are the storks and the egrets in my page photo above waiting and wading in the knee-deep waters? Because at low tide, when the waves ebb, food is exposed to them abundantly. They scurry for shrimps, small crabs, cockles, mussels and more. This is nature’s way.

From the report, the technology spending average among the Malaysian banks is pathetic.

The negative domino effect on SMEs

When the banks are not spending on technology, the other industries, especially the SMEs (small medium enterprises) follow suit. The “penny pinching” and “tightening purse string” effect permeates across industries, slowly and surely putting the negative effect in tech spending into a volatile spin-cycle.

From a macro-economic point of view, spending slows down. Buying less means lesser demands and effectively, lowering supply, and it rolls on. The law of demand and supply just got dumped into an abyss.

A great opportunity for those who see it

When I was an engineer at Sun Microsystems more than 2 decades ago, I read a comment delivered by one of the executives. It said “When times are bad, those who know will get the best parts“. I took his comment to heart because what he said held true, even until today.

This is the best time, when the country is experiencing an economic downturn. When the competitors are holding back and may be reeling from the negative effects of the economy, the banks are in the best position to grab the best deals. This is the time to gain market share, when the competition is holding back for fear that the economy will become softer.

Furthermore, with the low interest rates across the board, there is no better time than the present to step up the tech spending. Banks should know this very well but I am perplexed.

That is why the Malaysian banks must kick start their tech spending campaign now. And the SMEs will follow, overturning the downturn with demands of spending for the best “parts”. The supply “factories” are fired up again, and will lead to a positive growth to the economy.

Bank Negara RMiT is that one opportunity

One thing which has been looming is Bank Negara, Malaysia’s Central Bank, RMiT (Risk Management in Technology) framework. A new version was released in July 2019, and to me as an outsider, is a great opportunity to grab the best parts. And some of these standards will come into effect in January 2020

Bank Negara is strongly encouraging banks to improve the security and the confidence of the country’s financial industry, and the RMiT framework is really a prod to increase tech spending. Unfortunately, in some of my business interactions with a few of the banks, the feet dragging practice is prevalent.

Nature’s lesson

The best time to have your best pick is at low tide. This is nature’s lesson for us. What are we waiting for?

StorageGRID gets gritty

By cfheoh | March 9, 2020 - 7:06 am |March 9, 2020 Acquisition, Amazon Web Services, Analytics, API, Appliance, Artificial Intelligence, Backup, Big Data, Cloud, Clusters, Data Archiving, Data Fabric, Data Management, Data Protection, Deep Learning, Filesystems, HDS, Hitachi Vantara, ILM, Machine Learning, NAS, NetApp, Object Storage, Software Defined Storage, Storage Field Day, Storage Market Share, Storage Optimization, Tech Field Day

2 Comments

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at the event. The content of this blog is of my own opinions and views ]

NetApp® presented StorageGRID® Webscale (SGWS) at Storage Field Day 19 last month. It was timely when the general purpose object storage market, in my humble opinion, was getting disillusioned and almost about to deprive itself of the value of what it was supposed to be.

“Cheap and deep“, “Race to Zero” were some of the less storied calls I have come across when discussing about object storage, and it was really de-valuing the merits of object storage as vendors touted their superficial glory of being in the IDC Marketscape for Object-based Storage 2019.

Almost every single conversation I had in the past 3 years was either explaining what object storage is or “That is cheap storage right?”

Continue reading →

DellEMC Project Nautilus Re-imagine Storage for Streams

By cfheoh | February 24, 2020 - 5:56 am |February 25, 2020 Algorithm, Analytics, API, Artificial Intelligence, Big Data, Cloud, Confluent, Data, Data Management, Deep Learning, Dell, DellEMC, Edge Computing, EMC, Fog Computing, Industry 4.0, InfluxDB, IoT, Isilon, Kubernetes, Linux, Machine Learning, Pravega, Storage Field Day, Tech Field Day

2 Comments

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at this event. The content of this blog is of my own opinions and views ]

Cloud computing will have challenges processing data at the outer reach of its tentacles. Edge Computing, as it melds with the Internet of Things (IoT), needs a different approach to data processing and data storage. Data generated at source has to be processed at source, to respond to the event or events which have happened. Cloud Computing, even with 5G networks, has latency that is not sufficient to how an autonomous vehicle react to pedestrians on the road at speed or how a sprinkler system is activated in a fire, or even a fraud detection system to signal money laundering activities as they occur.

Furthermore, not all sensors, devices, and IoT end-points are connected to the cloud at all times. To understand this new way of data processing and data storage, have a look at this video by Jay Kreps, CEO of Confluent for Kafka® to view this new perspective.

Data is continuously and infinitely generated at source, and this data has to be compiled, controlled and consolidated with nanosecond precision. At Storage Field Day 19, an interesting open source project, Pravega, was introduced to the delegates by DellEMC. Pravega is an open source storage framework for streaming data and is part of Project Nautilus.

Rise of streaming time series Data

Processing data at source has a lot of advantages and this has popularized Time Series analytics. Many time series and streams-based databases such as InfluxDB, TimescaleDB, OpenTSDB have sprouted over the years, along with open source projects such as Apache Kafka®, Apache Flink and Apache Druid.

The data generated at source (end-points, sensors, devices) is serialized, timestamped (as event occurs), continuous and infinite. These are the properties of a time series data stream, and to make sense of the streaming data, new data formats such as Avro, Parquet, Orc pepper the landscape along with the more mature JSON and XML, each with its own strengths and weaknesses.

You can learn more about these data formats in the 2 links below:

DIY is difficult

Many time series projects started as DIY projects in many organizations. And many of them are still DIY projects in production systems as well. They depend on tribal knowledge, and these databases are tied to an unmanaged storage which is not congruent to the properties of streaming data.

At the storage end, the technologies today still rely on the SAN and NAS protocols, and in recent years, S3, with object storage. Block, file and object storage introduce layers of abstraction which may not be a good fit for streaming data.

Continue reading →

Komprise is a Winner

By cfheoh | February 10, 2020 - 9:12 am |February 10, 2020 Analytics, Big Data, CIFS, Data Archiving, Data Management, DellEMC, EMC, Filesystems, ILM, Komprise, NAS, NetApp, NFS, SMB, Storage Field Day, Tech Field Day

7 Comments

[Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies to be presented at this event. The content of this blog is of my own opinions and views]

I, for one perhaps have seen far too many “file lifecycle and data management” software solutions that involved tiering, hierarchical storage management, ILM or whatever you call them these days. If I do a count, I would have managed or implemented at least 5 to 6 products, including a home grown one.

The whole thing is a very crowded market and I have seen many which have come and gone, and so when the opportunity to have a session with Komprise came at Storage Field Day 19, I did not carry a lot of enthusiasm.

Continue reading →

Open Source and Open Standards open the Future

By cfheoh | February 3, 2020 - 7:19 am |February 3, 2020 API, Big Data, Cloud, Composable Infrastructure, Data Fabric, Data Security, Disks, Filesystems, Intel, Linux, Memory Cloud, nVidia, PCIe, RDMA, Solid State Devices, Storage Field Day, Tech Field Day

4 Comments

[Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies to be presented at this event. The content of this blog is of my own opinions and views]

Western Digital dived into Storage Field Day 19 in full force as they did in Storage Field Day 18. A series of high impact presentations, each curated for the diverse requirements of the audience. Several open source initiatives were shared, all open standards to address present inefficiencies and designed and developed for a greater future.

Zoned Storage

One of the initiatives is to increase the efficiencies around SMR and SSD zoning capabilities and removing the complexities and overlaps of both mediums. This is the Zoned Storage initiatives a technical working proposal to the existing NVMe standards. The resulting outcome will give applications in the user space more control on the placement of data blocks on zone aware devices and zoned SSDs, collectively as Zoned Block Device (ZBD). The implementation in the Linux user and kernel space is shown below:

Continue reading →

Hadoop is truly dead – LOTR version

By cfheoh | January 24, 2020 - 1:06 pm |January 24, 2020 Acquisition, Analytics, API, Artificial Intelligence, Big Data, Cloud, Cloudera, Containers, Data Management, Data Security, Deep Learning, Digital Transformation, Hadoop, Hadoop Clusters, Kubernetes, MapReduce, NAS, NetApp, Object Storage, Pure Storage, Storage Field Day, Tech Field Day

2 Comments

[Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies to be presented at this event. The content of this blog is of my own opinions and views]

This blog was not intended because it was not in my plans to write it. But a string of events happened in the Storage Field Day 19 week and I have the fodder to share my thoughts. Hadoop is indeed dead.

Warning: There are Lord of the Rings references in this blog. You might want to do some research. 😉

Storage metrics never happened

The fellowship of Arjan Timmerman, Keiran Shelden, Brian Gold (Pure Storage) and myself started at the office of Pure Storage in downtown Mountain View, much like Frodo Baggins, Samwise Gamgee, Peregrine Took and Meriadoc Brandybuck forging their journey vows at Rivendell. The podcast was supposed to be on the topic of storage metrics but was unanimously swung to talk about Hadoop under the stewardship of Mr. Stephen Foskett, our host of Tech Field Day. I saw Stephen as Elrond Half-elven, the Lord of Rivendell, moderating the podcast as he would have in the plans of decimating the One Ring in Mount Doom.

So there we were talking about Hadoop, or maybe Sauron, or both.

The photo of the Oliphaunt below seemed apt to describe the industry attacks on Hadoop.

Continue reading →

AI needs data we can trust

By cfheoh | January 22, 2020 - 5:42 am |January 22, 2020 Algorithm, Analytics, Artificial Intelligence, Backup, Big Data, Business Continuity, Cloud, Data, Data Availability, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Digital Transformation, Machine Learning

Misplacing Digital Transformation Priorities

By cfheoh | January 20, 2020 - 2:29 pm |January 20, 2020 Backup, Big Data, Business Continuity, BYOD, CIFS, Citrix, Cloud, Data, Data Archiving, Data Availability, Data Corruption, Data Management, Data Privacy, Data Protection, Data Security, Digital Transformation, Disaster Recovery, Dropbox, Filesystems, FreeNAS, Katana Logic, Microsoft, NAS, NetApp, NFS, Object Storage, QNAP, Reliability, SMB

1 Comment

[ Note: This article was published on LinkedIn on Jan 20th 2020. Here is the link to the original article ]

Digital Transformation is again a big word for 2020. As more and more organizations becoming digitalized, the opportunity to communicate, interact and collaborate has become easier, faster, more convenient than ever.

File Sharing forever

Working in projects, file sharing is a fundamental activity that underpins communication and collaboration. Network drives via NAS (network attached storage) for file sharing are common within the confines of the company network. The perimeter of the company’s network is further extended via VPN (virtual private network) access, allowing branch offices and remote individuals to access the files from the central NAS server. It is a workable solution albeit poor network performance in delivery, challenges of siloed data management and difficult scalability.

The phenomenon of Dropbox

When Dropbox arrived circa 2008-2009, it took the industry by storm. They practically invented the term BYOD (bring your own device) and capture the imagination of the file sharing market. Gartner recognized this and coined EFSS (enterprise file sync and share) to consolidate the burgeoning file sharing market. Pretenders and challengers flooded the market, and after the shakedown, Box.net, Microsoft OneDrive, Google Drive and of course, Dropbox, are some of the market leaders today.

A recent report by Markets & Markets listed these companies as players in the EFSS market.

EFSS Players by Markets & Markets October 2019

As the wheels of Digital Transformation turn, EFSS is changing as well. Gartner EFSS is now the CCP (content collaboration platform), releasing their Gartner Content Collaboration Platforms MarketPeer Insights report in April 2019. Continue reading →

Is General Purpose Object Storage disenfranchised?

By cfheoh | December 23, 2019 - 5:40 pm |January 14, 2020 100Gigabit Ethernet, Amazon Web Services, Analytics, API, Artificial Intelligence, Big Data, BYOD, Ceph, Cloud, Cloudian, Clusters, Deep Learning, DellEMC, Docker, Dropbox, Edge Computing, Filesystems, Flash, Gartner, Hadoop, HDS, High Performance Computing, Hitachi Vantara, IDC, Industry 4.0, IoT, Lustre, Machine Learning, Mellanox Technologies, Minio, NetApp, Object Storage, OpenIO, Openstack, Performance Benchmark, Reliability, Scale-out architecture, Software Defined Storage, Storage Field Day, Storage Market Share, swiftstack, Tape storage, Tech Field Day

6 Comments

[Disclosure: I am invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees will be covered by GestaltIT, the organizer and I am not obligated to blog or promote the vendors’ technologies to be presented at this event. The content of this blog is of my own opinions and views]

This is NOT an advertisement for coloured balls.

This is the license to brag for the vendors in the next 2 weeks or so, as we approach the 2020 new year. This, of course, is the latest 2019 IDC Marketscape for Object-based Storage, released last week.

My object storage mentions

I have written extensively about Object Storage since 2011. With different angles and perspectives, here are some of them:

The Future is Intelligent Objects (2011)
What should be Cloud Storage? (2011)
APIs that stick in Storage (2012)
Has Object Storage become the Everything Store? (2013)
Of Object Storage, Filesystems and Multicloud (2017)
My Dilemma of Stateful Storage Marriage (2018)
The Malaysian Openstack Storage Conundrum (2018)
Sleepless in Malaysia with Object Storage (2019)
The Waning Light of Openstack Swift (2019)

Continue reading →

Category Archives: Big Data

Falconstor Software Defined Data Preservation for the Next Generation

Reap at low tide

Why are they not spending?

The negative domino effect on SMEs

A great opportunity for those who see it

Bank Negara RMiT is that one opportunity

Nature’s lesson

DellEMC Project Nautilus Re-imagine Storage for Streams

Rise of streaming time series Data

DIY is difficult

Komprise is a Winner

Open Source and Open Standards open the Future

Zoned Storage

Continue reading →

Hadoop is truly dead – LOTR version

Storage metrics never happened

Recent Posts

Sponsored Ads

Google Adsense

Recent Comments

Google Adsense

Share this:

Why are they not spending?

The negative domino effect on SMEs

A great opportunity for those who see it

Bank Negara RMiT is that one opportunity

Nature’s lesson

Share this:

Share this:

Rise of streaming time series Data

DIY is difficult

Share this:

Share this:

Zoned Storage

Share this:

Storage metrics never happened

Share this:

Share this:

File Sharing forever

The phenomenon of Dropbox

Share this:

My object storage mentions

Share this:

Recent Posts

Sponsored Ads

Google Adsense

Recent Comments

Google Adsense