Commvault UDI – a new CPUU

[Preamble: I am a delegate of Storage Field Day 14. My expenses, travel and accommodation are paid for by GestaltIT, the organizer and I am not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]

I am here at the Commvault GO 2017. Bob Hammer, Commvault’s CEO is on stage right now. He shares his wisdom and the message is clear. IT to DT. IT to DT? Yes, Information Technology to Data Technology. It is all about the DATA.

The data landscape has changed. The cloud has changed everything. And data is everywhere. This omnipresence of data presents new complexity and new challenges. It is great to get Commvault acknowledging and accepting this change and the challenges that come along with it, and introducing their HyperScale technology and their secret sauce – Universal Dynamic Index.

Continue reading

Commvault calling again

[Preamble: I will be a delegate of Storage Field Day 14. My expenses, travel and accommodation are paid for by GestaltIT, the organizer and I am not obligated to blog or promote the technologies presented in this event]

I am off to the US again next Monday. I am attending Storage Field Day 14 and it will be a 20+ hour long haul flight. But this SFD has a special twist, because I will be Washington DC first for Commvault GO 2017 conference. And I can’t wait.

My first encounter with Commvault goes way back in early 2001. I recalled they had their Galaxy version but in terms of market share, they were relatively small compared to Veritas and IBM at the time. I was with NetApp back then, and customers in Malaysia hardly heard of them, except for the people in Shell IT International (SITI). For those of us in the industry, we all knew that SITI worldwide had an exclusive Commvault fork just for them.

Continue reading

Pure Electric!

I didn’t get a chance to attend Pure Accelerate event last month. From the blogs and tweets of my friends, Pure Accelerate was an awesome event. When I got the email invitation for the localized Pure Live! event in Kuala Lumpur, I told myself that I have to attend the event.

The event was yesterday, and I was not disappointed. Coming off a strong fiscal Q1 2018, it has appeared that Pure Storage has gotten many things together, chugging full steam at all fronts.

When Pure Storage first come out, I was one of the early bloggers who took a fancy of them. My 2011 blog mentioned the storage luminaries in their team. Since then, they have come a long way. And it was apt that on the same morning yesterday, the latest Gartner Magic Quadrant for Solid State Arrays 2017 was released.

Continue reading

The changing face of storage

No, we are not a storage company anymore. We are a data management company now.

I was reading a Forbes article interviewing NetApp’s CIO, Bill Miller. It was titled:

NetApp’s CIO Helps Drive Company’s Shift From Data Storage To Data Management

I was fairly surprised about the time it took for that mindset shift messaging from storage to data management. I am sure that NetApp has been doing that for years internally.

To me, the writing has been in the wall for years. But weak perception of storage, at least in this part of Asia, still lingers as that clunky, behind the glassed walls and crufty closets, noisy box of full of hard disk drives lodged with snakes and snakes of orange, turquoise or white cables. 😉

The article may come as a revelation to some, but the world of storage has changed indefinitely. The blurring of the lines began when software defined storage, or even earlier in the form of storage virtualization, took form. I even came up with my definition a couple of years ago about the changing face of storage framework. Instead of calling it data management, I called the new storage framework,  the Data Services Platform.

So, this is my version of the storage technology platform of today. This is the Data Services Platform I have been touting to many for the last couple of years. It is not just storage technology anymore; it is much more than that.

Continue reading

The engineering of Elastifile

[Preamble: I was a delegate of Storage Field Day 12. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented in this event]

When it comes to large scale storage capacity requirements with distributed cloud and on-premise capability, object storage is all the rage. Amazon Web Services started the object-based S3 storage service more than a decade ago, and the romance with object storage started.

Today, there are hundreds of object-based storage vendors out there, touting features after features of invincibility. But after researching and reading through many design and architecture papers, I found that many object-based storage technology vendors began to sound the same.

At the back of my mind, object storage is not easy when it comes to most applications integration. Yes, there is a new breed of cloud-based applications with RESTful CRUD API operations to access object storage, but most applications still rely on file systems to access storage for capacity, performance and protection.

These CRUD and CRUD-like APIs are the common semantics of interfacing object storage platforms. But many, many real-world applications do not have the object semantics to interface with storage. They are mostly designed to interface and interact with file systems, and secretly, I believe many application developers and users want a file system interface to storage. It does not matter if the storage is on-premise or in the cloud.

Let’s not kid ourselves. We are most natural when we work with files and folders.

Implementing object storage also denies us the ability to optimally utilize Flash and solid state storage on-premise when the compute is in the cloud. Similarly, when the compute is on-premise and the flash-based object storage is in the cloud, you get a mismatch of performance and availability requirements as well. In the end, there has to be a compromise.

Another “feature” of object storage is its poor ability to handle transactional data. Most of the object storage do not allow modification of data once the object has been created. Putting a NAS front (aka a NAS gateway) does not take away the fact that it is still object-based storage at the very core of the infrastructure, regardless if it is on-premise or in the cloud.

Resiliency, latency and scalability are the greatest challenges when we want to build a true globally distributed storage or data services platform. Object storage can be resilient and it can scale, but it has to compromise performance and latency to be so. And managing object storage will not be as natural as to managing a file system with folders and files.

Enter Elastifile.

Continue reading

FlashForward to Beyond

The flash frenzy has reached its zenith in 2016. We now no longer are interested in listening to storage technology vendors touting the power of solid state storage (NAND Flash included) over spinning drives.

The capacity of 3D NAND Flash SSDs has reached a whopping 15.3TB (that is even bigger than the 12TB 7200RPM HDDs of today), and with deduplication and compression, the storage efficiency has reached a conservative 4:1 or 5:1. Effective capacity of most mid-end storage arrays can easily reach 1-2 Petabytes.

And flash and hybrid platforms have reached maturity in these few short years. So what is next?

The landscape has obviously changed. The performance landscape, the capacity landscape and all related to the storage data points have changed. And the speed of SSDs together with the up-and-coming NVMe and NVDIMM technology in new storage array controllers are also shifting the data bottlenecks to another part of the architecture. The development of I/O communications and interfaces has to change as well, to take advantage of the asynchronous I/Os in storage tiering and caching using NAND Flash.

With this mature and well understood landscape, it is time to take Flash to the next level. This next level comes in the form of an exciting end-user conference in Singapore on 25th April 2017. It is called FlashForward.

The 2016 FlashForward event in Europe has already garnered great support from the cream of the storage technologists around the world, and had fantastic feedbacks from the end-user attendees. That FlashForward event has also seen the birth of an international business and technology exchange in its inaugural introduction.  Yes, it is time to learn from the field experts, and it is time to build on the Flash Platform for new Data Services.

From the sponsorship package brochure I have received, it is definitely an event not to be missed.

The FlashForward Conference in Singapore is exquisitely procured by Evito Ltd, under the stewardship of Mr. Paul Talbut. Paul is a very seasoned veteran in the global circuit as an SNIA director of several initiatives. He has been immensely involved in the development of several SNIA chapters around the world, including South Asia, Malaysia, India, China, and even Brazil. He also leads by example with the SNIA Global Steering Committee (GSC); he is the SNIA Global Education Director and at one time, SNIA DPCO (Data Protection & Capacity Optimization) global proctor.

I have had the honour working with Paul for almost 8 years now, and I am sure he will lead the FlashForward Conference with valuable insights and experiences.

This is probably the greatest period for the industry and end users to get involved in the FlashForward Conference. For one, it is endorsed by SNIA, the vendor-neutral association which has been the growth beacon of the storage networking industry.

Secondly, it is the perfect opportunity for technology vendors to build their mindshare with end users and customers. And with the endorsement of the independent field experts and technology practitioners, end users would have a field day garnering approvals for their decisions, as well as learning the best practices to build upon the Flash technology they have implemented in their data center space.

The sponsorship packages are listed below, and I do encourage technology vendors, especially the All-Flash vendors to use the FlashForward conference as a platform to build their mindshare, and most of all, their branding. Continue reading

The dark ages of data is coming

A recent report intrigued me. Given the recent uprising of data, data and more data, things are getting a bit absurd about the voluminous data we are collecting and storing. The flip is that we might need all these data for analytics and getting more insight from the data.

The Veritas Darkberg report revealed that a very large percentage of the data collected and stored by organizations are useless data, unknown and unused. I captured a snapshot of the report below:

Screen Shot 2015-11-08 at 8.03.05 AM

From the screenshot above, it shows 54% of the landscape surveyed is dark data, unseen and clogging up the storage. And in an instance, the Darkberg (cross of “Dark” and “Iceberg”) report knocked a lot of sense into this whole data acquisition frenzy we are going through right now.

Continue reading

Why demote archived data access?

We are all familiar with the concept of data archiving. Passive data gets archived from production storage and are migrated to a slower and often, cheaper storage medium such tapes or SATA disks. Hence the terms nearline and offline data are created. With that, IT constantly reminds users that the archived data is infrequently accessed, and therefore, they have to accept the slower access to passive, archived data.

The business conditions have certainly changed, because the need for data to be 100% online is becoming more relevant. The new competitive nature of businesses dictates that data must be at the fingertips, because speed and agility are the new competitive advantage. Often the total amount of data, production and archived data, is into hundred of TBs, even into PetaBytes!

The industries I am familiar with – Oil & Gas, and Media & Entertainment – are facing this situation. These industries have a deluge of files, and unstructured data in its archive, and much of it dormant, inactive and sitting on old tapes of a bygone era. Yet, these files and unstructured data have the most potential to be explored, mined and analyzed to realize its value to the organization. In short, the archived data and files must be democratized!

The flip side is, when the archived files and unstructured data are coupled with a slow access interface or unreliable storage infrastructure, the value of archived data is downgraded because of the aggravated interaction between access and applications and business requirements. How would organizations value archived data more if the access path to the archived data is so damn hard???!!!

An interesting solution fell upon my lap some months ago, and putting A and B together (A + B), I believe the access path to archived data can be unbelievably of high performance, simple, transparent and most importantly, remove the BLOODY PAIN of FILE AND DATA MIGRATION!  For storage administrators and engineers familiar with data migration, especially if the size of the migration is into hundreds of TBs or even PBs, you know what I mean!

I have known this solution for some time now, because I have been avidly following its development after its founders left NetApp following their Spinnaker venture to start Avere Systems.

avere_220

Continue reading

Hail Hydra!

The last of the Storage Field Day 6 on November 7th took me and the other delegates to NEC. There was an obvious, yet eerie silence among everyone about this visit. NEC? Are you kidding me?

NEC isn’t exactly THE exciting storage company in the Silicon Valley, yet I was pleasantly surprised with their HydraStorprowess. It is indeed quite a beast, with published numbers of backup throughput of 4PB/hour, and scales to 100PB of capacity. Most impressive indeed, and HydraStor deserves this blogger’s honourable architectural dissection.

HydraStor is NEC’s grid-based, scale-out storage platform with an object storage backend. The technology, powered by the DynamicStor ™ software, a distributed file system laid over the HydraStor grid architecture. At the same time, it has the DataRedux™ technology that provides the global in-line deduplication as the HydraStor ingests data for data protection, replication, archiving and WORM purposes. It is a massive data consolidation platform, storing gazillion loads of data (100PB you say?) for short-term and long-term retention and recovery.

The architecture is indeed solid, and its data availability goes beyond traditional RAID-level resiliency. HydraStor employs their proprietary erasure coding, called Distributed Resilient Data™. The resiliency knob can be configured to withstand 6 concurrent disks or nodes failure, but by default configured with a resiliency level of 3.

We can quickly deduce that DynamicStor™, DataRedux™ and Distributed Resilient Data™ are the technology pillars of HydraStor. How do they work, and how do they work together?

Let’s look a bit deeper into the HydraStor architecture.

HydraStor is made up of 2 types of nodes:

  • Accelerator Nodes
  • Storage Nodes

The Accelerator Nodes (AN) are the access nodes. They interface with the HydraStor front end, which could be CIFS, NFS or OST (Open Storage Technology). The AN nodes chunks the in-coming data and performs in-line deduplication at a very high speed. It can reach speed of 300TB/hour, which is blazingly fast!

The AN nodes also runs DynamicStor™, handling the performance heavy-lifting portion of HydraStor. The chunked data from the AN nodes are then passed on to the Storage Nodes (SN), where they are further “deduped in-line” to determined if the chunks are unique or not. It is a two-step inline deduplication process. Below is a diagram showing the ANs built above the SNs in the HydraStor grid architecture.

NEC AN & SN grid architecture

 

The HydraStor grid architecture is also a very scalable architecture, allow the dynamic scale-in and scale-out of both ANs and SNs. AN nodes and SN nodes can be added or removed into the system, auto-configuring and auto-optimizing while everything stays online. This capability further strengthens the reliability and the resiliency of the HydraStor.

NEC Hydrastor dynamic topology

Moving on to DataRedux™. DataRedux™ is HydraStor’s global in-line data deduplication technology. It performs dedupe at the sub-file level, with variable length window. This is performed at the AN nodes and the SN nodes level,chunking and creating unique hash values. All unique chunks are further compressed with a modified LZ compression algorithm, shrinking the data to its optimized footprint on the disk storage. To maintain the global in-line deduplication, the hash table is available across the HydraStor cluster.

NEC Deduplication & Compression

The unique data chunk resulting from deduplication and compression are then written to disks using the configured Distributed Resilient Data™ (DRD) algorithm, at its set resiliency level.

At the junction of DRD, with erasure coding parity, the data is broken up into multiples of fragments and assigned a parity to a grouping of fragments. If the resiliency level is set to 3 (the default), the data is broken into 12 pieces, 9 data fragments + 3 parity fragments. The 3 parity fragments corresponds to the resiliency level of 3. See diagram below of the 12 fragments spread across a group of selected disks in the storage pool of the Storage Nodes.

NEC DRD erasure coding on Storage Nodes

 

If the HydraStor experiences a failure in the disks or nodes, and has resulted in the loss of a fragment or fragments, the DRD self-healing function will auto-rebuild and auto-reconfigure the recovered fragments in another set of disks, maintaining the level of 3 parities.

The resiliency level, as mentioned earlier, can be set up to 6, boosting the HydraStor survival factor of 6 disks or nodes failure in the grid. See below of how the autonomous DRD recovery works:

NEC Autonomous Data recovery

Despite lacking the razzle dazzle of most Silicon Valley storage startups and upstarts, credit be given where credit is due. NEC HydraStor is indeed a strong show stopper.

However, in a market that is as fickle as storage, deduplication solutions such as HydraStor, EMC Data Domain, and HP StoreOnce, are being superceded by Copy Data Management technology, touted by Actifio. It was rumoured that EMC restructured their entire BURA (Backup Recovery Archive) division to DPAD (Data Protection and Availability Division) to go after the burgeoning copy data management market.

It would be good if NEC can take notice and turn their HydraStor “supertanker” towards the Copy Data Management market. That would be something special to savour.

P/S: NEC. Sorry about the title. I just couldn’t resist it 😉