btrfs butter gone bad?

I wrote about btrfs 8 years ago.

Since then, it has made its way into several small to mid-end storage solutions (more NAS inclined solutions) including Rockstor, Synology, Terramaster, and Asustor. In the Linux world, SUSE® Linux Enterprise Server and OpenSUSE® use btrfs as the default OS file system. I have decided to revisit btrfs filesystem to give some thoughts about its future.

Have you looked under the hood?

The sad part is not many people look under the hood anymore, especially for the market the btrfs storage vendors are targeting. The small medium businesses just want a storage which is cheap. But cheap comes at a risk where the storage reliability and data integrity are often overlooked.

The technical conversation is secondary and thus the lack of queries for strong enterprise features may be leading btrfs to be complacent in its development.

Continue reading

Time for Fujitsu Malaysia to twist and shout and yet …

The worldwide storage market is going through unprecedented change as it is making baby steps out of one of the longest recessions in history. We are not exactly out of the woods yet, given the Eurozone crisis, slowing growth in China and the little sputters in the US economy.

Back in early 2012, Fujitsu has shown good signs of taking market share in the enterprise storage but what happened to that? In the last 2 quarters, the server boys in the likes of HP, IBM and Dell storage market share have either shrunk (in the case of HP and Dell) or tanked (as in IBM). I would have expected Fujitsu to continue its impressive run and continue to capture more of the enterprise market, and yet it didn’t. Why?

I was given an Eternus storage technology update by the Fujitsu Malaysia pre-sales team more than a year ago. It has made some significant gains in technology such as Advanced Copy, Remote Copy, Thin Provisioning, and Eco-Mode, but I was unimpressed. The technology features were more like a follower, since every other storage vendor in town already has those features.

Continue reading

AoE – All about Ethernet!

This is long overdue.

A reader of my blog asked if I could do a piece on Coraid. Coraid who?

This name is probably a name not many people heard of in Malaysia. Even most the storage guys that I talk to never heard of it.

I have known about Coraid for a few years now (thanks to my incessant reading habits), looking at it from nonchalant point of view.  But when the reader asked about Coraid, I contacted Kevin Brown, CEO of Coraid, whom I am not exactly sure how I was connected through LinkedIn. Kevin was very responsive and got one of their Directors to contact me. Kaushik Shirhatti was his name and he was very passionate to share their Coraid technology with me. Thanks Kevin and Kaushik!

That was months ago but the thought of writing this blog post has been lingering. I had to scratch the itch. 😉

So, what’s up with Coraid? I can tell that they are different but seems to me that their entire storage architecture is so simple that it takes a bit of time for even storage guys to wrap their head around it. Why do I say that?

For storage guys (like me), we are used to layers. One of the memorable movie quotes I recalled was from Shrek: “Orges are like onions! Onions have layers!“.

Continue reading

Run free … Symantec FileStore

It has been a rough and tough 3 weeks and I missed writing my blog. Last week, the toughest of the 3, was my CompTIA Storage+ training to Symantec SEs in Malaysia. They were a great crowd, and I loved it but I was really tired after that.

One exciting news during that week was the ouster of long time employee, and CEO of Symantec, Enrique Salem and replacing him with Steve Bennett, their Chairman. The news of that unfortunate event can be read from here and here. And almost hours after that, the calls to break up the Veritas portion of Symantec came up and putting pressure on the board of directors in Symantec to either spin-off the entity or sell it off.

To be fair, many observers, including me, believed that the marriage between Symantec and Veritas in 2005 wasn’t really what you would call a “match made in heaven”. It was more like strange bedfellows to me. And there was an internal joke (one that I could not verify) about the Veritas CEO, Gary Bloom’s promise to the Veritas board when he joined them from Oracle in 2000.

It went like this:

Gary Bloom promised the Veritas board of directors in 2000 that he would be able to bring Veritas to a USD$5 billion dollar company in 5 years time. Nearing the end of the 5 years in 2005, Gary fulfilled his promise by merging with Symantec, instantly making Veritas a USD$5 billion dollar company.”

Note: This is just an inside joke which I heard from a Veritas friend back in 2005, and by no means put Gary Bloom in a bad light. If I did, I apologize.

But back to the present. Our class last week brought up the subject of Symantec FileStore. When it first came out in October 2009, I thought it was an interesting solution. For once, I thought there was something could “out filesystem” NetApp’s ONTAP and WAFL, because Veritas had one of the best scale-out, clustered file systems. They just haven’t figured out the front end protocols yet, where NAS and iSCSI reigned. Veritas File System (VxFS) and Veritas Cluster File System as part of Veritas Cluster Server (VCS) was mature and proven in the enterprise. Along with Veritas Volume Manager (VxVM), this was perhaps THE best file system/volume management suite around. Mind you, ZFS hasn’t reached the level of prominence yet at that time.

Continue reading

“I want to put in my own hard disk”

I want to put in my own hard disk“.

If a customer ever utter that sentence, it will trigger a storage vendor meltdown. Panic buttons, alarm bells, and everything else that will lead a salesman to go berserk. That’s a big NO, NO!

For decades, storage vendors have relied on proprietary hardware to keep customers in line, and have customers continue to sign hefty maintenance contracts until the next tech refresh. The maintenance contract, with support, software upgrades and hardware spares replacement, defines the storage networking industry that we are in. Even as some vendors have commoditized their hardware on the x86 platforms, and on standard enterprise hard disk drives (HDDs), NICs and HBAs, that openness and convenience of commodity hardware savings are usually not passed on the customers.

It is easy to explain to customers that keeping their enterprise data in reliable and high performance storage hardware with performance optimization and special firmware is paramount, and any unwarranted and unvalidated hardware would put the customer’s data at high risk.

There is a choice now. The ripple of enterprise-grade, open storage kernel and file system has just started its first ring, and we hope that this small ripple will reverberate across the storage industry in the next few years.

Continue reading

ARC reactor also caches?

The fictional arc reactor in Iron Man’s suit was the epitome of coolness for us geeks. In the latest edition of Oracle Magazine, Iron Man is on the cover, as well as the other 5 Avengers in a limited edition series (see below).

Just about the same time, I am reading up on the ARC (Adaptive Replacement Caching) that is adopted in ZFS. I am learning in depth of how ZFS caching works as opposed to the more popular LRU (Least Recently Used) caching algorithm that is used in most storage cache memory. Having said that, most storage vendors employed a modified LRU algorithm, with the intention to keep the most recently accessed pages in memory as long as possible. This is true in NetApp’s Data ONTAP (maybe not the ONTAP GX in which I have little experience) and EMC FlareOE. ONTAP goes further to by keeping the most frequently accessed pages permanently in memory. EMC folks would probably refer to most recently accessed as spatial locality while most frequently accessed as temporal locality.

Why is ZFS using ARC and what is ARC? Continue reading

Joy(ent) to the World

When someone as important and as prominent as Jason Hoffman reads and follows your blog, you tend to stand up and take notice. I found out last week that Jason Hoffman, Founder and CTO of Joyent, was doing just that, I was deeply honoured and elated.

My Asian values started kicking in and I felt that I should reciprocate his gracious visits with a piece on Joyent. I have known about Joyent, thanks to Bryan Cantrill as the VP of Engineering because I am bloody impressed with his work with DTrace. And I have followed Joyent’s announcements every now and then, even recommending a job that was posted on Joyent’s website for a Service Delivery Manager in Asia Pacific for my buddy a couple of months ago. He’s one of the best Solaris engineers I have ever worked with but the problem with techies is, they tend to wait for everything to fall into place before they do the next thing. Too methodical!

I took some time over the weekend to understand a bit more about Joyent and their solution offerings. They are doing some mighty cool stuff and if you are Unix/Linux buff/bigot like me, you would be damn impressed. For those people who has experienced Unix and especially Solaris, there is an unexplained element that describes the fire and the passion of such a techie. I was feeling all the good vibes all over again.

Unfortunately, Joyent is not well known in this part of the world but I am well aware of their partnership with a local company called XyBase in an announcement in June last year. Xybase, through its vehicle called Anise Asia, entered into the partnership to resell Joyent’s SmartCenter solution. For those who has worked with XyBase in Malaysia, let’s not go there. 😉

Enough chitter-chatter! What’s Joyent about?

Well, for Malaysian IT followers, we are practically drowned in VMware. VMware does a seminar every 1.5 months or so, and they get invited to other vendors’ events ever so frequently as well. My buddy, Mr. Ong Kok Leong, who was an early employee in VMware Malaysia, has been elevated to superstardom, thanks to his presence in everything VMware. It’s a good thing and kudos to VMware to take advantage of their first-to-market, super gung-ho approach in the last 3 years or so. They have built a sizable lead in the local market and the competitors like Citrix Xen, Microsoft Hyper-V are being left in a dust. I believe only RedHat’s KVM is making a bit of a dent but they are primarily confined to their own RedHat space. Furthermore, most of VMware competitors do not have a strong portfolio and a complete software stack to challenge VMware and what they have been churning out.

Here’s my take … consider Joyent because I see Joyent having a very, very strong portfolio to give VMware a run for its money. Public listed VMware has deep pockets to continue their marketing blitz and because of where they are right now, they have gotten very pricey and complicated. And this blogger intends to level the playing field a bit by sharing more about Joyent and their solutions.

I see Joyent having 4 very strong technologies that differentiates them from others. These technologies (in no particular order) are:

  • node.js
  • ZFS
  • DTrace
  • KVM

These technologies have been proven in the field because Joyent has been deploying, stress testing them and improving on them in their own cloud offering called Joyent Cloud for the last few year. This is true “eating your own dogfood” and putting your money where your mouth is. This is a very important considering when building a Cloud Computing offering, especially in the public cloud space. You need something that is proven and Joyent Cloud is testimonial to Joyent’s technology.

So let’s start with a diagram of the Joyent Cloud Software Stack.


Key to the performance of Joyent Cloud is node.js.

node.js as quoted in its website is “Node.js is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.” The key to this is being event-driven and asynchronous and cloud solutions developed using node.js are able to go faster, scale bigger and respond better. The event-based model follows a programming approach in which the flow of the program is determined by events that occurred.

A simple analogy is when you (in Malaysia) is at McDonald’s. In the past, the McDonald’s staff will service and fulfill your order before they service the next customer and so on. That was the flow of the past. Some time last year, McDonalds’ decide that their front staff would take your order, sends you to a queue and then took the order of the next customer. The back-end support staff would then fulfill your order putting that burger and drink on your tray. That is why they are able to serve (take your money) faster and get more things done. This is what I understand about event-driven, when it is applied in a programming content.

node.js has been touted as the new “Ruby-on-Rails” and it is all about low-latency, and concurrency in applications, especially cloud applications. Here’s a video introducing node.js, by Joyent’s very own Ryan Dahl, the creator of node.js.

Besides performance, you would also need a strong and robust file system to ensure security, data integrity and protection of data as it scales. ZFS is a 128-bit, enterprise file system that was developed in Sun more than 10 years ago, and I am a big admirer of the ZFS technology. I have written about ZFS in the past, comparing it with NetApp’s Data ONTAP and also written about ZFS self-healing properties in dealing with Silent Data Corruption. In fact, my buddy (him being the more technical one) and I have been developing storage solutions with ZFS.

Cloud Computing is complex and you have to know what’s happening in the Cloud. For the Cloud Service Provider, they must know the real-time behaviour of the cloud properties. It could be for performance, resource consumption and contention, bottlenecks, applications characteristics, and even for finding the problems as quickly as possible. For the customers, they must have the ability to monitor, understand and report what they are consuming and using in the Cloud.

The regular used buzzword is Analytics and DTrace is the framework developed for Cloud Analytics. When it comes to analytics, nothing comes close to what DTrace can do. Most vendors (including VMware) will provide APIs for 3rd party ISVs to develop cloud analytics but nothing beats having the creator of the cloud technology given you the tools that they use internally. That is what Joyent is giving to the customer, DTrace, a tool that they use themselves internally. Here’s a screenshot of DTrace in action for Joyent’s SmartDataCenter.


I have always said that you got to see  it to know it. Cloud visibility is crucial for the optimal operational efficiency of the cloud.

Joyent already has Solaris Zones technology in its offering. But the missing piece was bare metal hypervisor and last year, Joyent added the final piece. KVM (Kernel-based Virtualization) was ported to Joyent, and KVM is more secure, and faster than the traditional approach of VMware, which relies on binary translation. KVM would mean that the virtualization kernel has direct interaction and communication with the native  x86 virtualization on processors that supports hardware virtualization extension. There is a whole religious debate about native, paravirtualization and binary translation on the web. You can read one here, and as I said, KVM is native virtualization.

There are lots more to know about Joyent but you got to spend some time to learn about it. It is not well known (yet) in this part of the world, my intention in this blog entry is to disseminate information so that you readers don’t have to be droned into one thing only.

There are choices and in the virtualization space, it is just not always about VMware. VMware deserves to be where they are but when one comes into power (like VMware), he/she tends to become less friendly to work it. A customer should not be subjected to this new order of oppression because businesses are there when there are customers. And as customers, they are always choices and Joyent is one good choice.

Primary Dedupe where are you?

I am a bit surprised that primary storage deduplication has not taken off in a big way, unlike the times when the buzz of deduplication first came into being about 4 years ago.

When the first deduplication solutions first came out, it was particularly aimed at the backup data space. It is now more popularly known as secondary data deduplication, the technology has reduced the inefficiencies of backup and helped sparked the frenzy of adulation of companies like Data Domain, Exagrid, Sepaton and Quantum a few years ago. The software vendors were not left out either. Symantec, Commvault, and everyone else in town had data deduplication for backup and archiving.

It was no surprise that EMC battled NetApp and finally won the rights to acquire Data Domain for USD$2.4 billion in 2009. Today, in my opinion, the landscape of secondary data deduplication has pretty much settled and matured. Practically everyone has some sort of secondary data deduplication technology or solution in place.

But then the talk of primary data deduplication hardly cause a ripple when compared a few years ago, especially here in Malaysia. Yeah, the IT crowd is pretty fickle that way because most tend to follow the trend of the moment. Last year was Cloud Computing and now the big buzz word is Big Data.

We are here to look at technologies to solve problems, folks, and primary data deduplication technology solutions should be considered in any IT planning. And it is our job as storage networking professionals to continue to advise customers about what is relevant to their business and addressing their pain points.

I get a bit cheesed off that companies like EMC, or HDS continue to spend their marketing dollars on hyping the trends of the moment rather than using some of their funds to promote good technologies such as primary data deduplication that solve real life problems. The same goes for most IT magazines, publications and other communications mediums, rarely giving space to technologies that solves problems on the ground, and just harping on hypes, fuzz and buzz. It gets a bit too ordinary (and mundane) when they are trying too hard to be extraordinary because everyone is basically talking about the same freaking thing at the same time, over and over again. (Hmmm … I think I am speaking off topic now .. I better shut up!)

We are facing an avalanche of data. The other day, the CEO of Nexenta used the word “data tsunami” but whatever terms used do not matter. There is too much data. Secondary data deduplication solved one part of the problem and now it’s time to talk about the other part, which is data in primary storage, hence primary data deduplication.

What is out there?  Who’s doing what in term of primary data deduplication?

NetApp has their A-SIS (now NetApp Dedupe) for years and they are good in my books. They talk to customers about the benefits of deduplication on their FAS filers. (Side note: I am seeing more benefits of using data compression in primary storage but I am not going to there in this entry). EMC has primary data deduplication in their Celerra years ago but they hardly talk much about it. It’s on their VNX as well but again, nobody in EMC ever speak about their primary deduplication feature.

I have always loved Ocarina Networks ECO technology and Dell don’t give much hoot about Ocarina since the acquisition in  2010. The technology surfaced a few months ago in Dell DX6000G Storage Compression Node for its Object Storage Platform, but then again, all Dell talks about is their Fluid Data Architecture from the Compellent division. Hey Dell, you guys are so one-dimensional! Ocarina is a wonderful gem in their jewel case, and yet all their storage guys talk about are Compellent  and EqualLogic.

Moving on … I ought to knock Oracle on the head too. ZFS has great data deduplication technology that is meant for primary data and a couple of years back, Greenbytes took that and made a solution out of it. I don’t follow what Greenbytes is doing nowadays but I do hope that the big wave of primary data deduplication will rise for companies such as Greenbytes to take off in a big way. No thanks to Oracle for ignoring another gem in ZFS and wasting their resources on pre-sales (in Malaysia) and partners (in Malaysia) that hardly know much about the immense power of ZFS.

But an unexpected source coming from Microsoft could help trigger greater interest in primary data deduplication. I have just read that the next version of Windows Server OS will have primary data deduplication integrated into NTFS. The feature will be available in Windows 8 and the architectural view is shown below:

The primary data deduplication in NTFS will be a feature add-on for Windows Server users. It is implemented as a filter driver on a per volume basis, with each volume a complete, self describing unit. It is cluster aware, and fully crash consistent on all operations.

The technology is Microsoft’s own technology, built from scratch and will be working to position Hyper-V as an strong enterprise choice in its battle for the server virtualization space with VMware. Mind you, VMware already has a big, big lead and this is just something that Microsoft must do-or-die to keep Hyper-V playing catch-up. Otherwise, the gap between Microsoft and VMware in the server virtualization space will be even greater.

I don’t have the full details of this but I read that the NTFS primary deduplication chunk sizes will be between 32KB to 128KB and it will be post-processing.

With Microsoft introducing their technology soon, I hope primary data deduplication will get some deserving accolades because I think most companies are really not doing justice to the great technologies that they have in their jewel cases. And I hope Microsoft, with all its marketing savviness and adeptness, will do some justice to a technology that solves real life’s data problems.

I bid you good luck – Primary Data Deduplication! You deserved better.