Storage in a shiny multi-cloud space

The multi-cloud for infrastructure-as-a-service (IaaS) era is not here (yet). That is what the technology marketers want you to think. The hype, the vapourware, the frenzy. It is what they do. The same goes to technology analysts where they describe vision and futures, and the high level constructs and strategies to get there. The hype of multi-cloud is often thought of running applications and infrastructure services seamlessly in several public clouds such as Amazon AWS, Microsoft® Azure and Google Cloud Platform, and linking it to on-premises data centers and private clouds. Hybrid is the new black.

Multicloud connectivity to public cloud providers and on-premises private cloud

Multi-Cloud, on-premises, public and hybrid clouds

And the aspiration of multi-cloud is the right one, when it is truly ready. Gartner® wrote a high level article titled “Why Organizations Choose a Multicloud Strategy“. To take advantage of each individual cloud’s strengths and resiliency in respective geographies make good business sense, but there are many other considerations that cannot be an afterthought. In this blog, we look at a few of them from a data storage perspective.

In the beginning there was … 

For this storage dinosaur, data storage and compute have always coupled as one. In the mainframe DASD days. these 2 were together. Even with the rise of networking architectures and protocols, from IBM SNA, DECnet, Ethernet & TCP/IP, and Token Ring FC-SAN (sorry, this is just a joke), the SANs, the filers to the servers were close together, albeit with a network buffered layer.

A decade ago, when the public clouds started appearing, data storage and compute were mostly inseparable. There was demarcation of public clouds and private clouds. The notion of hybrid clouds meant public clouds and private clouds can intermix with on-premise computing and data storage but in almost all cases, this was confined to a single public cloud provider. Until these public cloud providers realized they were not able to entice the larger enterprises to move their IT out of their on-premises data centers to the cloud convincingly. So, these public cloud providers decided to reverse their strategy and peddled their cloud services back to on-prem. Today, Amazon AWS has Outposts; Microsoft® Azure has Arc; and Google Cloud Platform launched Anthos.

Continue reading

Persistent Storage could stifle Google Anthos multi-cloud ambitions

To win in the multi-cloud game, you have to be in your competitors’ cloud. Google Cloud has been doing that since they announced Google Anthos just over a year ago. They have been crafting their “assault”, starting with on-premises, and Anthos on AWS. Anthos on Microsoft® Azure is coming, currently in preview mode.

Google CEO Sundar Pichai announcing Google Anthos at Next ’19

BigQuery Omni conversation starter

2 weeks ago, whilst the Google Cloud BigQuery Omni announcement was still under wraps, local Malaysian IT portal Enterprise IT News sent me the embargoed article to seek my views and opinions. I have to admit that I was ignorant about the deeper workings of BigQuery, and haven’t fully gone through the works of Google Anthos as well. So I researched them.

Having done some small works on Qubida (defunct) and Talend several years ago, I have grasped useful data analytics and data enablement concepts, and so BigQuery fitted into my understanding of BigQuery Omni quite well. That triggered my interests to write this blog and meshing the persistent storage conundrum (at least for me it is something to be untangled) to Kubernetes, to GKE (Google Kubernetes Engine), and thus Anthos as well.

For discussion sake, here is an overview of BigQuery Omni.

An overview of Google Cloud BigQuery Omni on multiple cloud providers

My comments and views are in this EITN article “Google Cloud’s BigQuery Omni for Multi-cloud Analytics”.

Continue reading

Falconstor Software Defined Data Preservation for the Next Generation

Falconstor® Software is gaining momentum. Given its arduous climb back to the fore, it is beginning to soar again.

Tape technology and Digital Data Preservation

I mentioned that long term digital data preservation is a segment within the data lifecycle which has merits and prominence. SNIA® has proved that this is a strong growing market segment through its 2007 and 2017 “100 Year Archive” surveys, respectively. 3 critical challenges of this long, long-term digital data preservation is to keep the archives

  • Accessible
  • Undamaged
  • Usable

For the longest time, tape technology has been the king of the hill for digital data preservation. The technology is cheap, mature, and many enterprises has built their long term strategy around it. And the pulse in the tape technology market is still very healthy.

The challenges of tape remain. Every 5 years or so, companies have to consider moving the data on the existing tape technology to the next generation. It is widely known that LTO can read tapes of the previous 2 generations, and write to it a generation before. The tape transcription process of migrating digital data for the sake of data preservation is bad because it affects the structural integrity and quality of the content of the data.

In my times covering the Oil & Gas subsurface data management, I have seen NOCs (national oil companies) with 500,000 tapes of all generations, from 1/2″ to DDS, DAT to SDLT, 3590 to LTO 1-7. And millions are spent to transcribe these tapes every few years and we have folks like Katalyst DM, Troika and more hovering this landscape for their fill.

Continue reading

The Falcon to soar again

One of the historical feats which had me mesmerized for a long time was the 14-year journey China’s imperial treasures took to escape the Japanese invasion in the early 1930s, sandwiched between rebellions and civil wars in China. More than 20,000 pieces of the imperial treasures took a perilous journey to the west and back again. Divided into 3 routes over a decade and four years, not a single piece of treasure was broken or lost. All in the name of preservation.

Today, that 20,000 over pieces live in perpetuity in 2 palaces – Beijing Palace Museum in China and National Palace Museum Taipei in Taiwan

Digital data preservation

Digital data preservation is on another end of the data lifecycle spectrum. More often than not, it is not the part that many pay attention to. In the past 2 decades, digital data has grown so much that it is now paramount to keep the data forever. Mind you, this is not the data hoarding kind but to preserve the knowledge and wisdom which is in the digital content of the data.

[ Note: If you are interested to know more about Data -> Information -> Knowledge -> Wisdom, check out my 2015 article on LinkedIn ]

SNIA (Storage Networking Industry Association) conducted 2 surveys – one in 2007 and another in 2017 – called the 100 Year Archive, and found that the requirement for preserving digital data has grown multiple folds over the 10 years. In the end, the final goal is to ensure that the perpetual digital contents are

  • Accessible
  • Undamaged
  • Usable

All at an affordable cost. Therefore, SNIA has the vision that the digital content must transcend beyond the storage medium, the storage system and the technology that holds it.

The Falcon reemerges

A few weeks ago, I had the privilege to speak with Falconstor® Software‘s David Morris (VP of Global Product Strategy & Marketing) and Mark Delsman (CTO). It was my first engagement with Falconstor® in almost 9 years! I wrote a piece of Falconstor® in my blog in 2011.

Continue reading

4 Digital Workplace Moves after COVID-19

[ Note: This article was published on LinkedIn on March 24, 2020. Here is the link to the original article ]

We live in unprecedented times. Malaysia has been in Movement Control Order (MCO) Day 7, which is basically a controlled lockdown of movements and activities. In many cases, businesses have grounded to a halt, and the landscape has changed forever. The “office” will not always be a premise anymore, and the “meetings” will not be a physical face-to-face conversation to build relationships and trust.

Trust is vital. A couple of weeks ago, I wrote 關係 (Guan Xi), and having to re-invent Trust in a Digital World.

No alt text provided for this image

The impact on organizations and businesses is deep and powerful and so, as we move forward when the COVID-19 pandemic dies down, organizations’ plans in their Digital Transformation strategy will change as well.

Here are 4 technology areas which I think must take precedence for the Digital Workplace in the Digital Transformation strategy.

Software-Defined Wide Area Network (SD-WAN)

Physically connections have been disrupted. Digital connections are on the rise to supplant “networking” in our physical business world, and the pandemic situation just tipped the scale.

Many small medium businesses (SMBs) rely on home broadband, which may be good enough for some. Medium to large organizations have broadband for business. Larger organizations which have deeper pockets might already have MPLS (multiprotocol label switching) or leased line in place. A large portion might have VPN (virtual private network) set up too.

In time, SD-WAN (software-defined wide area network) services should be considered more profoundly. SD-WAN is a more prudent approach that inculcates digital workplace policies such as quality of service (QOS) for critical data connections, allocating network attributes to different data workloads and network traffic, VPN features and most come with enhanced security addendum as well. .

In addition to performance, security and capacity control, SD-WAN implementation helps shape employees’ digital workplace practices but most importantly, redefine the organization’s processes and conditioning employees’ mindsets in the Digital Transformation journey.

 

Video Meetings & Conferencing

The Video Meetings and Conferencing solutions have become the poster child in the present pandemic situation. Zoom, Webex, Microsoft Teams, Skype (it is going away), GoToMeetings and more are dominating the new norm of work. Work from home (WFH) has a totally new meaning now, especially for employees who have been conditioned to work in an “office”.

I had more than 15 Zoom meetings (the free version) last week when the Malaysian MCO started, and Zoom has become a critical part of my business now, and thus, it is time to consider paid solutions like Zoom or Webex as part of an organization’s Digital Workplace plans. These will create the right digital culture for the new Digital Workplace.

Personally I like Uberconference because of their on-hold song. It is sang by their CEO, Alex Cornell. Check out this SoundCloud recording.

File Sharing

Beneath the hallowed halls of video meetings and conferencing, collaboration happens with data shared in files. We have been with file and folders from our C: drives or NAS Home Directories or File Server’s shared drives that these processes are almost second nature to us.

In the face of this COVID-19 pandemic, files and information sharing has become cumbersome. The shared drive is no longer in our network, because we are not in the organization’s LAN and intranet anymore. We are working at home, away from the gigabit network, protected by the organization’s firewall, and was once slaved … err … I mean supported by our IT admins.

The obvious reaction (since you can’t pass thumb drives anymore at present) is to resort to Dropbox, OneDrive, Google Drive and others, and hoping you won’t max out your free capacity. Or email attachments in emails going back and forth, and hoping the mail server will not reject files larger than 10MB.

The fortunate ones have VPN client on their laptops but the network backhaul traffic to the VPN server at the central VPN server, and overloading it to the max. Pretty soon, network connections are dropped, and the performance of file sharing sucks! Big time!

What if your organization is a bank? Or an Oil & Gas company where data protection and data sovereignty dictate the order of the day? All the very-public enterprise file sync and share (EFSS) like Dropbox or OneDrive or Google Drive totally violate the laws of the land, and your organization may be crippled by the inability to do work. After all, files and folders are like the peanut-butter-jelly or the nasi lemak-teh tarik (coconut rice & pulled tea Malaysian breakfast) combo of work. You can’t live without files and folders.

The thoughts of having a PRIVATE on-premises EFSS solution in your organization’s Digital Transformation strategy should be moved from the KIV (keep in view) tray to a defined project in the Digital Transformation programme.

At Katana Logic, we work with Easishare, and it is worth having a serious plan about building your own private file share and sync solution as part of the Digital Workplace.

Security

In such unprecedented times, where our attention is diverted, cybersecurity threats are at its highest. Financial institutions in Malaysia have already been summoned by Malaysia Bank Negara central bank to build the industry’s expectations and confidence through the RMiT framework. Conversations with some end users and IT suppliers to Malaysian banks and other financial institutions unfortunately, revealed the typical lackadaisical attitude to fortify cyber resiliency practices within these organizations. I would presume the importance of cybersecurity and cyber resiliency practices would take a even further back seat with small medium businesses.

On a pessimistic note, ransomware and DDOS (distributed denial-of-service) have been on the rise, and taking advantage of this pandemic situation. NAS, the network attached storage that serves the organization shared files and folders has become ransomware’s favourite target as I have wrote in my blog.

But it does not have to be expensive affair with cybersecurity. Applying a consistent periodical password change, educating employees about phishing emails, using a simple but free port scanners to look at open TCP/UDP ports can be invaluable for small medium businesses. Subscribing to penetration testing (pentest) services at a regular frequency is immensely helpful as well.

In larger organizations, cyber resiliency is more holistic. Putting in layers for defense in depth, CIA (confidentiality, integrity, availability) triad, AAA (authentication, authorization, audit) pro-active measures are all part of the cybersecurity framework. These holistic practices must effect change in people and the processes of how data and things are shared, used, protected and recovered in the whole scheme of things.

Thus organizations must be vigilant and do their due diligence. We must never bat any eye to fortify cyber security and cyber resiliency in the Digital Workplace.

Parting thoughts

We are at our most vulnerable stage of our lifetime but it is almost the best time to understand what is critical to our business. This pandemic is helping to identify the right priorities for Work.

At any level, regardless, organizations have to use the advantage of this COVID-19 situation to assess how it has impacted business. It must look at what worked and what did not in their digital transformation journey so far, and change the parts that were not effective.

I look at the 4 areas of technology that I felt it could make a difference and I am sure there are many more areas to address. So, use this pessimistic times and turn it into an optimistic one when we are back to normalcy. The Digital Workplace has changed forever, and for the better too.

Continue reading

DellEMC Project Nautilus Re-imagine Storage for Streams

[ Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley USA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer and I was not obligated to blog or promote the vendors’ technologies presented at this event. The content of this blog is of my own opinions and views ]

Cloud computing will have challenges processing data at the outer reach of its tentacles. Edge Computing, as it melds with the Internet of Things (IoT), needs a different approach to data processing and data storage. Data generated at source has to be processed at source, to respond to the event or events which have happened. Cloud Computing, even with 5G networks, has latency that is not sufficient to how an autonomous vehicle react to pedestrians on the road at speed or how a sprinkler system is activated in a fire, or even a fraud detection system to signal money laundering activities as they occur.

Furthermore, not all sensors, devices, and IoT end-points are connected to the cloud at all times. To understand this new way of data processing and data storage, have a look at this video by Jay Kreps, CEO of Confluent for Kafka® to view this new perspective.

Data is continuously and infinitely generated at source, and this data has to be compiled, controlled and consolidated with nanosecond precision. At Storage Field Day 19, an interesting open source project, Pravega, was introduced to the delegates by DellEMC. Pravega is an open source storage framework for streaming data and is part of Project Nautilus.

Rise of  streaming time series Data

Processing data at source has a lot of advantages and this has popularized Time Series analytics. Many time series and streams-based databases such as InfluxDB, TimescaleDB, OpenTSDB have sprouted over the years, along with open source projects such as Apache Kafka®, Apache Flink and Apache Druid.

The data generated at source (end-points, sensors, devices) is serialized, timestamped (as event occurs), continuous and infinite. These are the properties of a time series data stream, and to make sense of the streaming data, new data formats such as Avro, Parquet, Orc pepper the landscape along with the more mature JSON and XML, each with its own strengths and weaknesses.

You can learn more about these data formats in the 2 links below:

DIY is difficult

Many time series projects started as DIY projects in many organizations. And many of them are still DIY projects in production systems as well. They depend on tribal knowledge, and these databases are tied to an unmanaged storage which is not congruent to the properties of streaming data.

At the storage end, the technologies today still rely on the SAN and NAS protocols, and in recent years, S3, with object storage. Block, file and object storage introduce layers of abstraction which may not be a good fit for streaming data.

Continue reading

AI needs data we can trust

[ Note: This article was published on LinkedIn on Jan 21th 2020. Here is the link to the original article ]

In 2020, the intensity on the topic of Artificial Intelligence will further escalate.

One news which came out last week terrified me. The Sarawak courts want to apply Artificial Intelligence to mete judgment and punishment, perhaps on a small scale.

Continue reading

NAS is the next Ransomware goldmine

I get an email like this almost every day:

It is from one of my FreeNAS customers daily security run logs, emailed to our support@katanalogic.com alias. It is attempting a brute force attack trying to crack the authentication barrier via the exposed SSH port.

Just days after the installation was completed months ago, a bot has been doing IP port scans on our system, and found the SSH port open. (We used it for remote support). It has been trying every since, and we have been observing the source IP addresses.

The new Ransomware attack vector

This is not surprising to me. Ransomware has become more sophisticated and more damaging than ever because the monetary returns from the ransomware are far more effective and lucrative than other cybersecurity threats so far. And the easiest preys are the weakest link in the People, Process and Technology chain. Phishing breaches through social engineering, emails are the most common attack vectors, but there are vhishing (via voicemail) and smshing (via SMS) out there too. Of course, we do not discount other attack vectors such as mal-advertising sites, or exploits and so on. Anything to deliver the ransomware payload.

The new attack vector via NAS (Network Attached Storage) and it is easy to understand why.

Continue reading