Data – Page 5 – Storage Gaga

NAS is the next Ransomware goldmine

By cfheoh | January 7, 2020 - 5:58 am |January 7, 2020 Algorithm, Analytics, API, Artificial Intelligence, Backup, Business Continuity, CIFS, ClamAV, Cloud, compression, Containers, Data, Data Archiving, Data Availability, Data Corruption, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Disaster Recovery, Filesystems, FreeNAS, iXsystems, Machine Learning, NAS, Object Storage, QNAP, Reliability, Security, SMB, TrueNAS

2 Comments

I get an email like this almost every day:

It is from one of my FreeNAS customers daily security run logs, emailed to our support@katanalogic.com alias. It is attempting a brute force attack trying to crack the authentication barrier via the exposed SSH port.

Just days after the installation was completed months ago, a bot has been doing IP port scans on our system, and found the SSH port open. (We used it for remote support). It has been trying every since, and we have been observing the source IP addresses.

The new Ransomware attack vector

This is not surprising to me. Ransomware has become more sophisticated and more damaging than ever because the monetary returns from the ransomware are far more effective and lucrative than other cybersecurity threats so far. And the easiest preys are the weakest link in the People, Process and Technology chain. Phishing breaches through social engineering, emails are the most common attack vectors, but there are vhishing (via voicemail) and smshing (via SMS) out there too. Of course, we do not discount other attack vectors such as mal-advertising sites, or exploits and so on. Anything to deliver the ransomware payload.

The new attack vector via NAS (Network Attached Storage) and it is easy to understand why.

Continue reading →

Green Storage? Meh!

By cfheoh | December 3, 2019 - 11:19 am |December 3, 2019 Amazon Web Services, Analytics, API, Artificial Intelligence, Backup, Big Data, Business Continuity, Cloud, Composable Infrastructure, Data, Data Fabric, Data Management, Data Protection, Data Security, Deduplication, Digital Transformation, Disaster Recovery, Disks, Edge Computing, Google, Green, Green Computing, High Performance Computing, Hyperconvergence, IoT, Liqid, Machine Learning, Openstack, Oracle Cloud, Performance Benchmark, Performance Caching, Rackspace, Software Defined Storage, Software-defined Datacenter, Solid State Devices, Storage Optimization, Storage Tiering, Strongbox, Tape storage, Unified Storage, virtual tape library

1 Comment

Something triggered my thoughts a few days ago. A few of us got together talking about climate change and a friend asked how green was the datacenter in IT. With cloud computing booming, I would say that green computing isn’t really the hottest thing at present. That in turn, leads us to one of the most voracious energy beasts in the datacenter, storage. Where is green storage in the equation?

What is green?

Over the past decade, several storage related technologies were touted as more energy efficient. These include

Tape – when tapes are offline, they do not consume power and do not require cooling
Virtualization – Virtualization reduces the number of servers and desktops, and of course storage too
MAID (Massive Array of Independent Disks) – the arrays spin down the HDDs if idle for a period of time
SSD (Solid State Drives) – Compared to HDDs, SSDs consume much less power, and overall reduce the cooling needs
Data Footprint Reduction – Deduplication, compression and other technologies to reduce copies of data
SMR (Shingled Magnetic Recording) Drives – Higher areal density means less drives but limited by physics.

The largest gorilla in storage technology

HDDs still dominate the market and they are the biggest producers of heat and vibration in a storage array, along with the redundant power supplies and fans. Until and unless SSDs dominate, we have to live with the fact that storage disk drives are not green. The statistics from Statistica below forecasts that in 2021, the shipment of SSDs will surpass HDDs.

Today the areal density of HDDs have increased. With SMR (shingled magnetic recording), the areal density jumped about 25% more than the 1Tb/inch (Terabit per inch) in the CMR (conventional magnetic recording) drives. The largest SMR in the market today is 16TB from Seagate with 18TB SMR in the horizon. That capacity is going to grow significantly when EAMR (energy assisted magnetic recording) – which counts heat assisted and microwave assisted – drives enter the market next year. The areal density will grow to 1.6Tb/inch with a roadmap to 4.0Tb/inch. Continue reading →

Veaam to boost Cloud Data Management

By cfheoh | October 24, 2019 - 8:04 pm |October 24, 2019 Analytics, Backup, Business Continuity, Cloud, Cohesity, Commvault, Containers, Data, Data Archiving, Data Availability, Data Corruption, Data Management, Data Privacy, Data Protection, Data Security, Disaster Recovery, Forrester, Gartner, Kubernetes, LTFS, LTO, LTO-8, NAS, Reliability, Snapshots, Veeam, Veritas, Virtualization, VMware

Leave a comment

Cloud Data Management is a tricky word. Often vague, ambigious, how exactly would you define “Cloud Data Management“?

Fresh off the boat from Commvault GO 2019 in Denver, Colorado last week, I was invited to sample Veeam a few days ago at their Solution Day and soak into their rocketing sales in Asia Pacific, and strong market growth too. They reported their Q3 numbers this week, impressing many including yours truly.

I went to the seminar early in the morning, quite in awe of their vibrant partners and resellers activities and ecosystem compared to the tepid Commvault efforts in Malaysia over the past decade. Veeam’s presence in Malaysia is shorter than Commvault’s but they are able to garner a stronger following with partners and customers alike.

Continue reading →

Digital Transformation means Change in People

By cfheoh | July 2, 2019 - 6:46 am |July 2, 2019 Analytics, Artificial Intelligence, Big Data, Data, Data Management, Data Privacy, Data Security, Deep Learning, Digital Transformation, Industry 4.0, IoT, Machine Learning, Security

Leave a comment

I wrote about Digital Transformation a few weeks ago. In the heart of it, People are the real key to the transformation of every organization. Following up what I described earlier, Change is the factor that People in every organization have to embrace.

Drowning and going blind

We are swarmed by technology. We are inundated with everything digital and we are attracted to the latest buzz and hype. In the sea of it all, these things have made us, the People reliant of technology. This reliance, this needy dependency, has made us complacent. We settle because the boring and mundane tasks have been taken away from us. Moreover, the constant firehose feeding our lives has created “digital drowning“, a situation I would like describe as gasping for a breather to think clearly. We are bogged by digital quagmire, blinded by what shiny things and we lose sight of the strategic focus.

We shrivel and we go back to what we think is our comfort zone.

Change is constant and uncomfortable

I once read that our known comfort zone is no longer our safety zone. That idea of everyone’s safety zone has been obliterated aeons ago. I love the following quote from Seth Godin, my absolute marketing guru.

As he rightly pointed out, “There is no ‘ever after’. There’s just the chaos of now“. We don’t arrive at a comfortable place after the change. There is no comfortable place or safety place for that matter … at all. The Digital Transformation or what ever Information Age we described our generation earlier, is constant change. We have to ride the hungry bear and we have to saddle the ferocious dragon at all times. We have to learn to ride the bucking bronco!

So, we learn. We change and change. Continue reading →

Storage Performance Considerations for AI Data Paths

By cfheoh | June 17, 2019 - 10:50 am |June 17, 2019 100Gigabit Ethernet, Algorithm, Analytics, API, Artificial Intelligence, Big Data, Cloud, Composable Infrastructure, Data, Data Fabric, Data Management, Data Privacy, Data Security, Digital Transformation, Drivescale, E8 Storage, Edge Computing, Elastifile, Excelero, Filesystems, High Performance Computing, Hyperconvergence, Industry 4.0, Infiniband, Intel, Liqid, Lustre, Machine Learning, Mellanox Technologies, NVMe, Object Storage, Performance Benchmark, Performance Caching, Quantum Corporation, RDMA, Software-defined Datacenter, Storage Optimization, Storage Tiering, ThinkParq, Vast Data, Virtualization, WekaIO

1 Comment

The hype of Deep Learning (DL), Machine Learning (ML) and Artificial Intelligence (AI) has reached an unprecedented frenzy. Every infrastructure vendor from servers, to networking, to storage has a word to say or play about DL/ML/AI. This prompted me to explore this hyped ecosystem from a storage perspective, notably from a storage performance requirement point-of-view.

One question on my mind

There are plenty of questions on my mind. One stood out and that is related to storage performance requirements.

Reading and learning from one storage technology vendor to another, the context of everyone’s play against their competitors seems to be “They are archaic, they are legacy. Our architecture is built from ground up, modern, NVMe-enabled“. And there are more juxtaposing, but you get the picture – “We are better, no doubt“.

Are the data patterns and behaviours of AI different? How do they affect the storage design as the data moves through the workflow, the data paths and the lifecycle of the AI ecosystem?

Continue reading →

Did Cloud Kill LTFS?

By cfheoh | May 21, 2019 - 10:39 am |May 21, 2019 Acquisition, Backup, Business Continuity, Cloud, Data, Data Archiving, Data Availability, Data Domain, Data Management, Data Protection, Deduplication, DellEMC, Disks, EMC, ExaGrid, Falconstor, Filesystems, HDS, HPE, IBM, LTO, NAS, NetApp, Object Storage, Quantum Corporation, Strongbox, Veritas

2 Comments

I like LTFS (Linear Tape File System). I was hoping it would take off but it has not. And looking at its future, its significance is becoming less and less relevant. I look if Cloud has been a factor in the possible demise of LTFS in the next few years.

What is LTFS?

In a nutshell, Linear Tape File System makes LTO tapes look like a disk with a file system. It takes a tape and divides it into 2 partitions:

Index Partition (XML Index Schema with file names, metadata and attributes details)
Data Partition (where the data resides)

Diagram from https://www.snia.org/sites/default/orig/SDC2011/presentations/tuesday/DavidPease_LinearTape_File_System.pdf

It has a File System module which is implemented in supported OS of Unix/Linux, MacOS and Windows. And the mounted file system “tape partition” shows up as a drive or device.

Assassination attempts

There were many attempts to kill off tapes and so far, none has been successful.

Among the “tape-killer” technologies, I think the most prominent one is the VTL (Virtual Tape Library). There were many VTLs I encountered during my days in mid-2000s. NetApp had Alacritus and EMC had Clariion Disk Libraries. There were also IBM ProtecTIER, FalconStor VTL (which is still selling today) among others and Sepaton (read in reverse is “No Tapes’). Sepaton was acquired by Hitachi Data Systems several years back. Continue reading →

Is AI my friend?

By cfheoh | April 19, 2019 - 7:29 am |April 19, 2019 Algorithm, Analytics, Artificial Intelligence, Data, Data Corruption, Data Management, Data Privacy, Data Protection, Data Security, Deep Learning, Google, Machine Learning

Leave a comment

I am sorry, Dave …

Let’s start this story with 2 supposed friends – Dave and Hal.

How do we become friends?

We have friends and we have enemies. We become friends when trust is established. Trust is established when there is an unsaid pact, a silent agreement that I can rely on you to keep my secrets private. I will know full well that you will protect my personal details with a strong conviction. Your decisions and your actions towards me are in my best interest, unbiased and would benefit both me and you.

I feel secure with you.

AI is my friend

When the walls of uncertainty and falsehood are broken down, we trust our friends more and more. We share deeper secrets with our friends when we believe that our privacy and safety are safeguarded and protected. We know well that we can rely on them and their decisions and actions on us are reliable and unbiased.

AI, can I count on you to protect my privacy and give me security that my personal data is not abused in the hands of the privileged few?

AI, can I rely on you to be ethical, unbiased and give me the confidence that your decisions and actions are for the benefit and the good of me, myself and I?

My AI friends (maybe)

As I have said before, I am not a skeptic. When there is plenty of relevant, unbiased data fed into the algorithms of AI, the decisions are fair. People accept these AI decisions when the degree of accuracy is very close to the Truth. The higher the accuracy, the greater the Truth. The greater the Truth, the more confident people are towards the AI system.

Here are some AI “friends” in the news:

But we have to careful here as well. Accuracy can be subjective, paradoxical and enigmatic. When ethics are violated, we terminate the friendship and we reject the “friend”. We categorically label him or her as an enemy. We constantly have to check, just like we might, once in a while, investigate on our friends too.

In Conclusion

AI, can we be friends now?

[Apology: sorry about the Cyberdyne link 😉 ]

[This blog was posted in LinkedIn on Apr 19th 2019]

Figuring out storage for Kubernetes and containers

By cfheoh | April 16, 2019 - 6:44 pm |April 16, 2019 Amazon Web Services, API, Cloud, Containers, Data, Data Archiving, Data Availability, Data Management, Data Protection, Data Security, Docker, Edge Computing, Elastifile, Fibre Channel, Filesystems, Google, Hyperconvergence, Kubernetes, NFS, Openstack, Performance Benchmark, Performance Caching, Robin.io, Snapshots, Software Defined Storage, Storage Optimization, Virtualization, VMware

2 Comments

Oops! I forgot about you!

To me, containers and container orchestration (CO) engines such as Kubernetes, Mesos, Docker Swarm are fantastic. They scale effortlessly and are truly designed for cloud native applications (CNA).

But one thing irks me. Storage management for containers and COs. It was as if when they designed and constructed containers and the containers orchestration (CO) engines, they forgot about the considerations of storage and storage management. At least the persistent part of storage.

Over a year ago, I was in two minds about persistent storage, especially when it comes to the transient nature of microservices which was so prevalent and were inundating the cloud native applications landscape. I was searching for answers in my blog. The decentralization of microservices in containers means mass deployment at the edge, but to have the pre-processed and post-processed data stick to the persistent storage at the edge device is a challenge. The operative word here is “STICK”.

Two different worlds

Containers were initially designed and built for lightweight applications such as microservices. The runtime, libraries, configuration files and dependencies are all in one package. They were meant to do simple tasks quickly and scales to thousands easily. They could be brought up and brought down in little time and did not have to bother about the persistent data stored by the host. The state of the containers were also not important to the application tasks at hand.

Today containers like Docker have matured to run enterprise applications and the state of the container is important. The applications must know the state and the health of the container. The container could be in online mode, online but not accepting data mode, suspended mode, paused mode, interrupted mode, quiesced mode or halted mode. Each mode or state of the container is important to the running applications and the container can easily brought up or down in an instance of a command. The stateful nature of the containers and applications is critical for the business. The same situation applies to container orchestration engines such as Kubernetes.

Container and Kubernetes Storage

Docker provides 3 methods to local storage. In the diagram below, it describes:

Continue reading →

Data Privacy First before AI Framework

By cfheoh | April 16, 2019 - 9:51 am |April 16, 2019 Algorithm, Analytics, API, Artificial Intelligence, Big Data, Data, Data Privacy, Data Protection, Data Security, Deep Learning, Industry 4.0, Machine Learning, Microsoft

Leave a comment

A few days ago, I discovered that Malaysia already had plans for a National Artificial Intelligence (AI) Framework. It is led by Malaysia Digital Economy Corporation (MDEC) and it will be ready by the end of 2019. A Google search revealed a lot news and announcements, with a few dating back to 2017, but little information of the framework itself. Then again, Malaysia likes to take the “father knows best” approach, and assumes that what it is doing shouldn’t be questioned (much). I will leave this part as it is, because perhaps the details of the framework is under the OSA (Official Secrets Act).

Are we AI responsible or are we responsible for AI?

But I would like to highlight the data privacy part that is likely to figure strongly in the AI Framework, because the ethical use of AI is paramount. It will have economical, social and political impact on Malaysians, and everybody else too. I have written a few articles on LinkedIn about ethics, data privacy, data responsibility, impact of AI. You can read about them in the links below:

Data Responsibility – September 2017
Can we save ourselves from AI? – February 2018
Ethical Use of Data – April 2018
Malaysia, when will you take Data Privacy seriously? – April 2019

I may sound like a skeptic of AI. I am not. I believe AI will benefit mankind, and bring far reaching developments to the society as a whole. But we have to careful and this is my MAIN concern when I voice about AI. I continue to question the human ethics and the human biases that go into the algorithms that define AI. This has always been the crux of my gripes, my concerns, my skepticism of everything we call AI. I am not against AI but I am against the human flaws that shape the algorithms of AI.

Everything is a Sheep (or a Giraffe)

A funny story was shared with me last year. It was about Microsoft Azure computer vision algorithm in recognizing visuals in photos. Apparently the algorithm of the Microsoft Azure’s neural network was fed with some overzealous data of sheep (or giraffes), and the AI system started to point out that every spot that it “saw” was either a sheep, or any vertical long ones was a giraffe.

In the photo below, there were a bunch of sheep on a tree. Check out the tags/comments in the red rectangle published by the AI neural network software below and see how both Microsoft Azure and NeutralTalk2 “saw” in the photo. You can read more about the funny story here.

This proves my point that if you feed the learning system and the AI behind it with biased and flawed information, the result can be funny (in this case here) or disastrous. Continue reading →

We got to keep more data

By cfheoh | April 4, 2019 - 4:47 pm |April 4, 2019 Analytics, Artificial Intelligence, Big Data, Data, eDiscovery, Machine Learning

1 Comment

Guess which airport has won the most awards in the annual Skytrax list? Guess which airport won 480 awards since its opening in 1981? Guess how this airport did it?

Data Analytics gives the competive edge.

Serving and servicing more than 65 million passengers and travellers in 2018, and growing, Changi Airport Singapore sets a very high level customer service. And it does it with the help of technology, something they call Smart (Service Management through Analytics and Resource Transformation) Airport. In an ultra competitive and cut-throat airline business, the deep integration of customer-centric services and the ultimate traveller’s experience are crucial to the survival and growth of airlines. And it has definitely helped Singapore Airlines to be the world’s best airlines in 2018, its 4th win.

To achieve that, Changi Airport relies on technology and lots of relevant data for deep insights on how to serve its customers better. The details are well described in this old news article.

Keep More Relevant Data for Greater Insights

When I mean more data, I do not mean every single piece of data. Data has to be relevant to be useful.

How do we get more insights? How can we teach systems to learn? How to we develop artificial intelligence systems? By having more relevant data feeding into data analytics systems, machine learning and such.

As such, a simple framework for building from the data ingestion, to data repositories to outcomes such as artificial intelligence, predictive and recommendations systems, automation and new data insights isn’t difficult to understand. The diagram below is a high level overview of what I work with most of the time. Continue reading →

Category Archives: Data

Digital Transformation means Change in People

Drowning and going blind

Change is constant and uncomfortable