Data Archives - Page 4 of 10

The Edge is coming! The Edge is coming!

By cfheoh | October 12, 2020 - 9:15 am |October 11, 2020 100Gigabit Ethernet, Analytics, Big Data, Containers, Data, Deep Learning, Edge Computing, Flash, Industry 4.0, InfluxDB, Linux, Machine Learning, Mellanox, Mellanox Technologies, Minio, nVidia, NVMe, Pravega, SNIA, Solid State Devices

Leave a comment

Actually, Edge Computing is already here. It has been here on everyone’s lips for quite some time, but for me and for many others, Edge Computing is still a hodgepodge of many things. The proliferation of devices, IoT, sensor, end points being pulled into the ubiquitous term of Edge Computing has made the scope ever changing, and difficult to pin down. And it is this proliferation of edge devices that will generate voluminous amount of data. Obvious questions emerge:

How to do you store all the data?
How do you process all the data?
How do you derive competitive value from the data from these edge devices?
How do you securely transfer and share the data?

From the storage technology perspective, it might be easier to observe what are the traits of the data generated on the edge device. In this blog, we also observe what could some new storage technologies out there that could be part of the Edge Computing present and future.

Edge Computing overview – Cloud to Edge to Endpoint

Storage at the Edge

The mantra of putting compute as close to the data and processing it where it is stored is the main crux right now, at least where storage of the data is concerned. The latency to the computing resources on the cloud and back to the edge devices will not be conducive, and in many older settings, these edge devices in factory may not be even network enabled. In my last encounter several years ago, there were more than 40 interfaces, specifications and protocols, most of them proprietary, for the edge devices. And there is no industry wide standard for these edge devices too.

Continue reading →

Storage in a shiny multi-cloud space

By cfheoh | September 14, 2020 - 6:22 pm |September 14, 2020 Amazon, Amazon Web Services, Analytics, API, Artificial Intelligence, Backup, Big Data, Business Continuity, Cloud, Clumio, Containers, Data, Data Archiving, Data Availability, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Digital Transformation, Disaster Recovery, Docker, Druva, Gartner, Google, Google Anthos, High Performance Computing, Kubernetes, Machine Learning, Microsoft, Microsoft Azure, Object Storage, Oracle, Oracle Cloud, Rackspace, Software Defined Storage, Software-defined Datacenter, Storage Tiering, Wasabi Cloud

Leave a comment

The multi-cloud for infrastructure-as-a-service (IaaS) era is not here (yet). That is what the technology marketers want you to think. The hype, the vapourware, the frenzy. It is what they do. The same goes to technology analysts where they describe vision and futures, and the high level constructs and strategies to get there. The hype of multi-cloud is often thought of running applications and infrastructure services seamlessly in several public clouds such as Amazon AWS, Microsoft® Azure and Google Cloud Platform, and linking it to on-premises data centers and private clouds. Hybrid is the new black.

Multicloud connectivity to public cloud providers and on-premises private cloud

Multi-Cloud, on-premises, public and hybrid clouds

And the aspiration of multi-cloud is the right one, when it is truly ready. Gartner® wrote a high level article titled “Why Organizations Choose a Multicloud Strategy“. To take advantage of each individual cloud’s strengths and resiliency in respective geographies make good business sense, but there are many other considerations that cannot be an afterthought. In this blog, we look at a few of them from a data storage perspective.

In the beginning there was …

For this storage dinosaur, data storage and compute have always coupled as one. In the mainframe DASD days. these 2 were together. Even with the rise of networking architectures and protocols, from IBM SNA, DECnet, Ethernet & TCP/IP, and Token Ring FC-SAN (sorry, this is just a joke), the SANs, the filers to the servers were close together, albeit with a network buffered layer.

A decade ago, when the public clouds started appearing, data storage and compute were mostly inseparable. There was demarcation of public clouds and private clouds. The notion of hybrid clouds meant public clouds and private clouds can intermix with on-premise computing and data storage but in almost all cases, this was confined to a single public cloud provider. Until these public cloud providers realized they were not able to entice the larger enterprises to move their IT out of their on-premises data centers to the cloud convincingly. So, these public cloud providers decided to reverse their strategy and peddled their cloud services back to on-prem. Today, Amazon AWS has Outposts; Microsoft® Azure has Arc; and Google Cloud Platform launched Anthos.

Continue reading →

Persistent Storage could stifle Google Anthos multi-cloud ambitions

By cfheoh | July 27, 2020 - 9:15 am |July 26, 2020 Amazon Web Services, Analytics, API, Big Data, Cloud, Clusters, Containers, Data, Data Management, DellEMC, Docker, Google, Google Anthos, HPE, Kubernetes, Microsoft, Microsoft Azure, NetApp, Object Storage, Portworx, Pure Storage, Robin.io

Leave a comment

To win in the multi-cloud game, you have to be in your competitors’ cloud. Google Cloud has been doing that since they announced Google Anthos just over a year ago. They have been crafting their “assault”, starting with on-premises, and Anthos on AWS. Anthos on Microsoft® Azure is coming, currently in preview mode.

Google CEO Sundar Pichai announcing Google Anthos at Next ’19

BigQuery Omni conversation starter

2 weeks ago, whilst the Google Cloud BigQuery Omni announcement was still under wraps, local Malaysian IT portal Enterprise IT News sent me the embargoed article to seek my views and opinions. I have to admit that I was ignorant about the deeper workings of BigQuery, and haven’t fully gone through the works of Google Anthos as well. So I researched them.

Having done some small works on Qubida (defunct) and Talend several years ago, I have grasped useful data analytics and data enablement concepts, and so BigQuery fitted into my understanding of BigQuery Omni quite well. That triggered my interests to write this blog and meshing the persistent storage conundrum (at least for me it is something to be untangled) to Kubernetes, to GKE (Google Kubernetes Engine), and thus Anthos as well.

For discussion sake, here is an overview of BigQuery Omni.

An overview of Google Cloud BigQuery Omni on multiple cloud providers

My comments and views are in this EITN article “Google Cloud’s BigQuery Omni for Multi-cloud Analytics”.

Continue reading →

Commvault Metallic with Microsoft Azure – from seismic to tectonic

By cfheoh | July 13, 2020 - 9:30 am |July 11, 2020 Backup, Business Continuity, BYOD, Cloud, Clumio, Commvault, Data, Data Archiving, Data Availability, Data Management, Data Privacy, Data Protection, Data Security, Digital Transformation, Disaster Recovery, Druva, Hedvig, Kubernetes, Microsoft, Microsoft Azure, Object Storage, OwnBackup, Software Defined Storage, Software-defined Datacenter

4 Comments

When the elephant rumbles in the jungle, the whole village takes notice. That was what happened 2 weeks ago when Commvault® announced their multi-year agreement to place Metallic™ into deeper integration with Microsoft® Azure. This strategic partnership will consummate several key areas between the 2 companies, which are Engineering, Go To Market (GTM) and Sales.

The “low hanging fruit” move is of course the tight(er) integration with Microsoft® Azure Blob Storage but the more exciting anticipation is “What else is next“.

Metallic – A Commvault venture

Cloud First, Cloud Next, Cloud Big

The Metallic™ SaaS (Software-as-a-Service) offering isn’t new. It was unveiled as a Commvault® venture at Commvault® GO 2019, but it was only open to the US market then. Earlier last month, Commvault® switched on Metallic™ for the Canadian market, and it is inevitable that Metallic™ is coming to a region near you and I. ANZ is next, according to the grapevine, followed by Europe.

An O’Reilly® Media Cloud Adoption survey in January and February 2020 (just before the COVID-19 pandemic) revealed that 25% of the respondents “said that their companies plan to move all of their applications to a cloud context in the next year“. This is no coincidence. It is now Cloud First; Cloud Next; Cloud Big.

Continue reading →

Falconstor Software Defined Data Preservation for the Next Generation

By cfheoh | April 27, 2020 - 9:56 am |April 27, 2020 Amazon Web Services, Analytics, API, Appliance, Artificial Intelligence, Backup, Big Data, Business Continuity, Cloud, Clusters, Composable Infrastructure, compression, Containers, Data, Data Archiving, Data Availability, Data Corruption, Data Domain, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Deduplication, deduplication, Digital Transformation, Disaster Recovery, Disks, eDiscovery, Falconstor, HDS, Linux, LTFS, LTO, LTO-8, Microsoft, Microsoft Azure, NetApp, RAID, Scale-out architecture, Software Defined Storage, Software-defined Datacenter, Starwind, Storage Tiering, Tape storage, virtual tape library, Virtualization, VTL

Leave a comment

Falconstor® Software is gaining momentum. Given its arduous climb back to the fore, it is beginning to soar again.

Tape technology and Digital Data Preservation

I mentioned that long term digital data preservation is a segment within the data lifecycle which has merits and prominence. SNIA® has proved that this is a strong growing market segment through its 2007 and 2017 “100 Year Archive” surveys, respectively. 3 critical challenges of this long, long-term digital data preservation is to keep the archives

Accessible
Undamaged
Usable

For the longest time, tape technology has been the king of the hill for digital data preservation. The technology is cheap, mature, and many enterprises has built their long term strategy around it. And the pulse in the tape technology market is still very healthy.

The challenges of tape remain. Every 5 years or so, companies have to consider moving the data on the existing tape technology to the next generation. It is widely known that LTO can read tapes of the previous 2 generations, and write to it a generation before. The tape transcription process of migrating digital data for the sake of data preservation is bad because it affects the structural integrity and quality of the content of the data.

In my times covering the Oil & Gas subsurface data management, I have seen NOCs (national oil companies) with 500,000 tapes of all generations, from 1/2″ to DDS, DAT to SDLT, 3590 to LTO 1-7. And millions are spent to transcribe these tapes every few years and we have folks like Katalyst DM, Troika and more hovering this landscape for their fill.

Continue reading →

The Falcon to soar again

By cfheoh | April 20, 2020 - 8:30 am |April 18, 2020 Amazon Web Services, Analytics, API, Backup, Business Continuity, Cloud, Clusters, Containers, Data, Data Archiving, Data Corruption, Data Management, Data Privacy, Data Protection, Data Security, Deduplication, Digital Transformation, Disaster Recovery, Falconstor, LTFS, LTO, LTO-8, Object Storage, RAID, Scale-out architecture, SNIA, Software Defined Storage, Tape storage, virtual tape library, Virtualization, VTL

Leave a comment

One of the historical feats which had me mesmerized for a long time was the 14-year journey China’s imperial treasures took to escape the Japanese invasion in the early 1930s, sandwiched between rebellions and civil wars in China. More than 20,000 pieces of the imperial treasures took a perilous journey to the west and back again. Divided into 3 routes over a decade and four years, not a single piece of treasure was broken or lost. All in the name of preservation.

Today, that 20,000 over pieces live in perpetuity in 2 palaces – Beijing Palace Museum in China and National Palace Museum Taipei in Taiwan

Digital data preservation

Digital data preservation is on another end of the data lifecycle spectrum. More often than not, it is not the part that many pay attention to. In the past 2 decades, digital data has grown so much that it is now paramount to keep the data forever. Mind you, this is not the data hoarding kind but to preserve the knowledge and wisdom which is in the digital content of the data.

[ Note: If you are interested to know more about Data -> Information -> Knowledge -> Wisdom, check out my 2015 article on LinkedIn ]

SNIA (Storage Networking Industry Association) conducted 2 surveys – one in 2007 and another in 2017 – called the 100 Year Archive, and found that the requirement for preserving digital data has grown multiple folds over the 10 years. In the end, the final goal is to ensure that the perpetual digital contents are

Accessible
Undamaged
Usable

All at an affordable cost. Therefore, SNIA has the vision that the digital content must transcend beyond the storage medium, the storage system and the technology that holds it.

The Falcon reemerges

A few weeks ago, I had the privilege to speak with Falconstor® Software‘s David Morris (VP of Global Product Strategy & Marketing) and Mark Delsman (CTO). It was my first engagement with Falconstor® in almost 9 years! I wrote a piece of Falconstor® in my blog in 2011.

Continue reading →

4 Digital Workplace Moves after COVID-19

By cfheoh | March 25, 2020 - 3:24 pm |March 25, 2020 Appliance, Business Continuity, Cloud, Data, Data Availability, Data Management, Data Protection, Data Security, Digital Transformation, Disaster Recovery, SD-WAN

3 Comments

[ Note: This article was published on LinkedIn on March 24, 2020. Here is the link to the original article ]

We live in unprecedented times. Malaysia has been in Movement Control Order (MCO) Day 7, which is basically a controlled lockdown of movements and activities. In many cases, businesses have grounded to a halt, and the landscape has changed forever. The “office” will not always be a premise anymore, and the “meetings” will not be a physical face-to-face conversation to build relationships and trust.

Trust is vital. A couple of weeks ago, I wrote 關係 (Guan Xi), and having to re-invent Trust in a Digital World.

The impact on organizations and businesses is deep and powerful and so, as we move forward when the COVID-19 pandemic dies down, organizations’ plans in their Digital Transformation strategy will change as well.

Here are 4 technology areas which I think must take precedence for the Digital Workplace in the Digital Transformation strategy.

Software-Defined Wide Area Network (SD-WAN)

Physically connections have been disrupted. Digital connections are on the rise to supplant “networking” in our physical business world, and the pandemic situation just tipped the scale.

Many small medium businesses (SMBs) rely on home broadband, which may be good enough for some. Medium to large organizations have broadband for business. Larger organizations which have deeper pockets might already have MPLS (multiprotocol label switching) or leased line in place. A large portion might have VPN (virtual private network) set up too.

In time, SD-WAN (software-defined wide area network) services should be considered more profoundly. SD-WAN is a more prudent approach that inculcates digital workplace policies such as quality of service (QOS) for critical data connections, allocating network attributes to different data workloads and network traffic, VPN features and most come with enhanced security addendum as well. .

In addition to performance, security and capacity control, SD-WAN implementation helps shape employees’ digital workplace practices but most importantly, redefine the organization’s processes and conditioning employees’ mindsets in the Digital Transformation journey.

Video Meetings & Conferencing

The Video Meetings and Conferencing solutions have become the poster child in the present pandemic situation. Zoom, Webex, Microsoft Teams, Skype (it is going away), GoToMeetings and more are dominating the new norm of work. Work from home (WFH) has a totally new meaning now, especially for employees who have been conditioned to work in an “office”.

I had more than 15 Zoom meetings (the free version) last week when the Malaysian MCO started, and Zoom has become a critical part of my business now, and thus, it is time to consider paid solutions like Zoom or Webex as part of an organization’s Digital Workplace plans. These will create the right digital culture for the new Digital Workplace.

Personally I like Uberconference because of their on-hold song. It is sang by their CEO, Alex Cornell. Check out this SoundCloud recording.

File Sharing

Beneath the hallowed halls of video meetings and conferencing, collaboration happens with data shared in files. We have been with file and folders from our C: drives or NAS Home Directories or File Server’s shared drives that these processes are almost second nature to us.

In the face of this COVID-19 pandemic, files and information sharing has become cumbersome. The shared drive is no longer in our network, because we are not in the organization’s LAN and intranet anymore. We are working at home, away from the gigabit network, protected by the organization’s firewall, and was once slaved … err … I mean supported by our IT admins.

The obvious reaction (since you can’t pass thumb drives anymore at present) is to resort to Dropbox, OneDrive, Google Drive and others, and hoping you won’t max out your free capacity. Or email attachments in emails going back and forth, and hoping the mail server will not reject files larger than 10MB.

The fortunate ones have VPN client on their laptops but the network backhaul traffic to the VPN server at the central VPN server, and overloading it to the max. Pretty soon, network connections are dropped, and the performance of file sharing sucks! Big time!

What if your organization is a bank? Or an Oil & Gas company where data protection and data sovereignty dictate the order of the day? All the very-public enterprise file sync and share (EFSS) like Dropbox or OneDrive or Google Drive totally violate the laws of the land, and your organization may be crippled by the inability to do work. After all, files and folders are like the peanut-butter-jelly or the nasi lemak-teh tarik (coconut rice & pulled tea Malaysian breakfast) combo of work. You can’t live without files and folders.

The thoughts of having a PRIVATE on-premises EFSS solution in your organization’s Digital Transformation strategy should be moved from the KIV (keep in view) tray to a defined project in the Digital Transformation programme.

At Katana Logic, we work with Easishare, and it is worth having a serious plan about building your own private file share and sync solution as part of the Digital Workplace.

Security

In such unprecedented times, where our attention is diverted, cybersecurity threats are at its highest. Financial institutions in Malaysia have already been summoned by Malaysia Bank Negara central bank to build the industry’s expectations and confidence through the RMiT framework. Conversations with some end users and IT suppliers to Malaysian banks and other financial institutions unfortunately, revealed the typical lackadaisical attitude to fortify cyber resiliency practices within these organizations. I would presume the importance of cybersecurity and cyber resiliency practices would take a even further back seat with small medium businesses.

On a pessimistic note, ransomware and DDOS (distributed denial-of-service) have been on the rise, and taking advantage of this pandemic situation. NAS, the network attached storage that serves the organization shared files and folders has become ransomware’s favourite target as I have wrote in my blog.

But it does not have to be expensive affair with cybersecurity. Applying a consistent periodical password change, educating employees about phishing emails, using a simple but free port scanners to look at open TCP/UDP ports can be invaluable for small medium businesses. Subscribing to penetration testing (pentest) services at a regular frequency is immensely helpful as well.

In larger organizations, cyber resiliency is more holistic. Putting in layers for defense in depth, CIA (confidentiality, integrity, availability) triad, AAA (authentication, authorization, audit) pro-active measures are all part of the cybersecurity framework. These holistic practices must effect change in people and the processes of how data and things are shared, used, protected and recovered in the whole scheme of things.

Thus organizations must be vigilant and do their due diligence. We must never bat any eye to fortify cyber security and cyber resiliency in the Digital Workplace.

Parting thoughts

We are at our most vulnerable stage of our lifetime but it is almost the best time to understand what is critical to our business. This pandemic is helping to identify the right priorities for Work.

At any level, regardless, organizations have to use the advantage of this COVID-19 situation to assess how it has impacted business. It must look at what worked and what did not in their digital transformation journey so far, and change the parts that were not effective.

I look at the 4 areas of technology that I felt it could make a difference and I am sure there are many more areas to address. So, use this pessimistic times and turn it into an optimistic one when we are back to normalcy. The Digital Workplace has changed forever, and for the better too.

Continue reading →

DellEMC Project Nautilus Re-imagine Storage for Streams

By cfheoh | February 24, 2020 - 5:56 am |February 25, 2020 Algorithm, Analytics, API, Artificial Intelligence, Big Data, Cloud, Confluent, Data, Data Management, Deep Learning, Dell, DellEMC, Edge Computing, EMC, Fog Computing, Industry 4.0, InfluxDB, IoT, Isilon, Kubernetes, Linux, Machine Learning, Pravega, Storage Field Day, Tech Field Day

2 Comments

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at this event. The content of this blog is of my own opinions and views ]

Cloud computing will have challenges processing data at the outer reach of its tentacles. Edge Computing, as it melds with the Internet of Things (IoT), needs a different approach to data processing and data storage. Data generated at source has to be processed at source, to respond to the event or events which have happened. Cloud Computing, even with 5G networks, has latency that is not sufficient to how an autonomous vehicle react to pedestrians on the road at speed or how a sprinkler system is activated in a fire, or even a fraud detection system to signal money laundering activities as they occur.

Furthermore, not all sensors, devices, and IoT end-points are connected to the cloud at all times. To understand this new way of data processing and data storage, have a look at this video by Jay Kreps, CEO of Confluent for Kafka® to view this new perspective.

Data is continuously and infinitely generated at source, and this data has to be compiled, controlled and consolidated with nanosecond precision. At Storage Field Day 19, an interesting open source project, Pravega, was introduced to the delegates by DellEMC. Pravega is an open source storage framework for streaming data and is part of Project Nautilus.

Rise of streaming time series Data

Processing data at source has a lot of advantages and this has popularized Time Series analytics. Many time series and streams-based databases such as InfluxDB, TimescaleDB, OpenTSDB have sprouted over the years, along with open source projects such as Apache Kafka®, Apache Flink and Apache Druid.

The data generated at source (end-points, sensors, devices) is serialized, timestamped (as event occurs), continuous and infinite. These are the properties of a time series data stream, and to make sense of the streaming data, new data formats such as Avro, Parquet, Orc pepper the landscape along with the more mature JSON and XML, each with its own strengths and weaknesses.

You can learn more about these data formats in the 2 links below:

DIY is difficult

Many time series projects started as DIY projects in many organizations. And many of them are still DIY projects in production systems as well. They depend on tribal knowledge, and these databases are tied to an unmanaged storage which is not congruent to the properties of streaming data.

At the storage end, the technologies today still rely on the SAN and NAS protocols, and in recent years, S3, with object storage. Block, file and object storage introduce layers of abstraction which may not be a good fit for streaming data.

Continue reading →

AI needs data we can trust

By cfheoh | January 22, 2020 - 5:42 am |January 22, 2020 Algorithm, Analytics, Artificial Intelligence, Backup, Big Data, Business Continuity, Cloud, Data, Data Availability, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Digital Transformation, Machine Learning

Leave a comment

[ Note: This article was published on LinkedIn on Jan 21th 2020. Here is the link to the original article ]

In 2020, the intensity on the topic of Artificial Intelligence will further escalate.

One news which came out last week terrified me. The Sarawak courts want to apply Artificial Intelligence to mete judgment and punishment, perhaps on a small scale.

Continue reading →

Misplacing Digital Transformation Priorities

By cfheoh | January 20, 2020 - 2:29 pm |January 20, 2020 Backup, Big Data, Business Continuity, BYOD, CIFS, Citrix, Cloud, Data, Data Archiving, Data Availability, Data Corruption, Data Management, Data Privacy, Data Protection, Data Security, Digital Transformation, Disaster Recovery, Dropbox, Filesystems, FreeNAS, Katana Logic, Microsoft, NAS, NetApp, NFS, Object Storage, QNAP, Reliability, SMB

1 Comment

[ Note: This article was published on LinkedIn on Jan 20th 2020. Here is the link to the original article ]

Digital Transformation is again a big word for 2020. As more and more organizations becoming digitalized, the opportunity to communicate, interact and collaborate has become easier, faster, more convenient than ever.

File Sharing forever

Working in projects, file sharing is a fundamental activity that underpins communication and collaboration. Network drives via NAS (network attached storage) for file sharing are common within the confines of the company network. The perimeter of the company’s network is further extended via VPN (virtual private network) access, allowing branch offices and remote individuals to access the files from the central NAS server. It is a workable solution albeit poor network performance in delivery, challenges of siloed data management and difficult scalability.

The phenomenon of Dropbox

When Dropbox arrived circa 2008-2009, it took the industry by storm. They practically invented the term BYOD (bring your own device) and capture the imagination of the file sharing market. Gartner recognized this and coined EFSS (enterprise file sync and share) to consolidate the burgeoning file sharing market. Pretenders and challengers flooded the market, and after the shakedown, Box.net, Microsoft OneDrive, Google Drive and of course, Dropbox, are some of the market leaders today.

A recent report by Markets & Markets listed these companies as players in the EFSS market.

EFSS Players by Markets & Markets October 2019

As the wheels of Digital Transformation turn, EFSS is changing as well. Gartner EFSS is now the CCP (content collaboration platform), releasing their Gartner Content Collaboration Platforms MarketPeer Insights report in April 2019. Continue reading →

Category Archives: Data