Can CDMI emancipate an interoperable medical records cloud ecosystem?

PREFACE: This is just a thought, an idea. I am by no means an expert in this area. I have researched this to inspire a thought process of how we can bring together 2 disparate worlds of medical records and imaging with the emerging cloud services for healthcare.

Healthcare has been moving out of its archaic shell in the past few years, and digital healthcare technology and services are booming. And this movement is part of the digital transformation which could eventually lead to a secure and compliant distribution and collaboration of health data, medical imaging and electronic medical records (EMR).

It is a blessing that today’s medical imaging industry has been consolidated with the DICOM (Digital Imaging and Communications in Medicine) standard. DICOM dictates the how medical imaging information and pictures are used, stored, printed, transmitted and exchanged. It is also a communication protocol which runs over TCP/IP, and links up different service class providers (SCPs) and service class users (SCUs), and the backend systems such as PACS (Picture Archiving & Communications Systems) and RIS (Radiology Information Systems).

Another well accepted standard is HL7 (Health Level 7), a similar Layer 7, application-level communication protocol for transferring and exchanging clinical and administrative data.

The diagram below shows a self-contained ecosystem involving the front-end HIS (Hospital Information Systems), and the integration of healthcare, medical systems and other DICOM modalities.

Hospital Enterprise

(Picture courtesy of Meddiff Technologies)

Continue reading

The reverse wars – DAS vs NAS vs SAN

It has been quite an interesting 2 decades.

In the beginning (starting in the early to mid-90s), SAN (Storage Area Network) was the dominant architecture. DAS (Direct Attached Storage) was on the wane as the channel-like throughput of Fibre Channel protocol coupled by the million-device addressing of FC obliterated parallel SCSI, which was only able to handle 16 devices and throughput up to 80 (later on 160 and 320) MB/sec.

NAS, defined by CIFS/SMB and NFS protocols – was happily chugging along the 100 Mbit/sec network, and occasionally getting sucked into the arguments about why SAN was better than NAS. I was already heavily dipped into NFS, because I was pretty much a SunOS/Solaris bigot back then.

When I joined NetApp in Malaysia in 2000, that NAS-SAN wars were going on, waiting for me. NetApp (or Network Appliance as it was known then) was trying to grow beyond its dot-com roots, into the enterprise space and guys like EMC and HDS were frequently trying to put NetApp down.

It’s a toy”  was the most common jibe I got in regular engagements until EMC suddenly decided to attack Network Appliance directly with their EMC CLARiiON IP4700. EMC guys would fondly remember this as the “NetApp killer“. Continue reading

Why demote archived data access?

We are all familiar with the concept of data archiving. Passive data gets archived from production storage and are migrated to a slower and often, cheaper storage medium such tapes or SATA disks. Hence the terms nearline and offline data are created. With that, IT constantly reminds users that the archived data is infrequently accessed, and therefore, they have to accept the slower access to passive, archived data.

The business conditions have certainly changed, because the need for data to be 100% online is becoming more relevant. The new competitive nature of businesses dictates that data must be at the fingertips, because speed and agility are the new competitive advantage. Often the total amount of data, production and archived data, is into hundred of TBs, even into PetaBytes!

The industries I am familiar with – Oil & Gas, and Media & Entertainment – are facing this situation. These industries have a deluge of files, and unstructured data in its archive, and much of it dormant, inactive and sitting on old tapes of a bygone era. Yet, these files and unstructured data have the most potential to be explored, mined and analyzed to realize its value to the organization. In short, the archived data and files must be democratized!

The flip side is, when the archived files and unstructured data are coupled with a slow access interface or unreliable storage infrastructure, the value of archived data is downgraded because of the aggravated interaction between access and applications and business requirements. How would organizations value archived data more if the access path to the archived data is so damn hard???!!!

An interesting solution fell upon my lap some months ago, and putting A and B together (A + B), I believe the access path to archived data can be unbelievably of high performance, simple, transparent and most importantly, remove the BLOODY PAIN of FILE AND DATA MIGRATION!  For storage administrators and engineers familiar with data migration, especially if the size of the migration is into hundreds of TBs or even PBs, you know what I mean!

I have known this solution for some time now, because I have been avidly following its development after its founders left NetApp following their Spinnaker venture to start Avere Systems.

avere_220

Continue reading

Hail Hydra!

The last of the Storage Field Day 6 on November 7th took me and the other delegates to NEC. There was an obvious, yet eerie silence among everyone about this visit. NEC? Are you kidding me?

NEC isn’t exactly THE exciting storage company in the Silicon Valley, yet I was pleasantly surprised with their HydraStorprowess. It is indeed quite a beast, with published numbers of backup throughput of 4PB/hour, and scales to 100PB of capacity. Most impressive indeed, and HydraStor deserves this blogger’s honourable architectural dissection.

HydraStor is NEC’s grid-based, scale-out storage platform with an object storage backend. The technology, powered by the DynamicStor ™ software, a distributed file system laid over the HydraStor grid architecture. At the same time, it has the DataRedux™ technology that provides the global in-line deduplication as the HydraStor ingests data for data protection, replication, archiving and WORM purposes. It is a massive data consolidation platform, storing gazillion loads of data (100PB you say?) for short-term and long-term retention and recovery.

The architecture is indeed solid, and its data availability goes beyond traditional RAID-level resiliency. HydraStor employs their proprietary erasure coding, called Distributed Resilient Data™. The resiliency knob can be configured to withstand 6 concurrent disks or nodes failure, but by default configured with a resiliency level of 3.

We can quickly deduce that DynamicStor™, DataRedux™ and Distributed Resilient Data™ are the technology pillars of HydraStor. How do they work, and how do they work together?

Let’s look a bit deeper into the HydraStor architecture.

HydraStor is made up of 2 types of nodes:

  • Accelerator Nodes
  • Storage Nodes

The Accelerator Nodes (AN) are the access nodes. They interface with the HydraStor front end, which could be CIFS, NFS or OST (Open Storage Technology). The AN nodes chunks the in-coming data and performs in-line deduplication at a very high speed. It can reach speed of 300TB/hour, which is blazingly fast!

The AN nodes also runs DynamicStor™, handling the performance heavy-lifting portion of HydraStor. The chunked data from the AN nodes are then passed on to the Storage Nodes (SN), where they are further “deduped in-line” to determined if the chunks are unique or not. It is a two-step inline deduplication process. Below is a diagram showing the ANs built above the SNs in the HydraStor grid architecture.

NEC AN & SN grid architecture

 

The HydraStor grid architecture is also a very scalable architecture, allow the dynamic scale-in and scale-out of both ANs and SNs. AN nodes and SN nodes can be added or removed into the system, auto-configuring and auto-optimizing while everything stays online. This capability further strengthens the reliability and the resiliency of the HydraStor.

NEC Hydrastor dynamic topology

Moving on to DataRedux™. DataRedux™ is HydraStor’s global in-line data deduplication technology. It performs dedupe at the sub-file level, with variable length window. This is performed at the AN nodes and the SN nodes level,chunking and creating unique hash values. All unique chunks are further compressed with a modified LZ compression algorithm, shrinking the data to its optimized footprint on the disk storage. To maintain the global in-line deduplication, the hash table is available across the HydraStor cluster.

NEC Deduplication & Compression

The unique data chunk resulting from deduplication and compression are then written to disks using the configured Distributed Resilient Data™ (DRD) algorithm, at its set resiliency level.

At the junction of DRD, with erasure coding parity, the data is broken up into multiples of fragments and assigned a parity to a grouping of fragments. If the resiliency level is set to 3 (the default), the data is broken into 12 pieces, 9 data fragments + 3 parity fragments. The 3 parity fragments corresponds to the resiliency level of 3. See diagram below of the 12 fragments spread across a group of selected disks in the storage pool of the Storage Nodes.

NEC DRD erasure coding on Storage Nodes

 

If the HydraStor experiences a failure in the disks or nodes, and has resulted in the loss of a fragment or fragments, the DRD self-healing function will auto-rebuild and auto-reconfigure the recovered fragments in another set of disks, maintaining the level of 3 parities.

The resiliency level, as mentioned earlier, can be set up to 6, boosting the HydraStor survival factor of 6 disks or nodes failure in the grid. See below of how the autonomous DRD recovery works:

NEC Autonomous Data recovery

Despite lacking the razzle dazzle of most Silicon Valley storage startups and upstarts, credit be given where credit is due. NEC HydraStor is indeed a strong show stopper.

However, in a market that is as fickle as storage, deduplication solutions such as HydraStor, EMC Data Domain, and HP StoreOnce, are being superceded by Copy Data Management technology, touted by Actifio. It was rumoured that EMC restructured their entire BURA (Backup Recovery Archive) division to DPAD (Data Protection and Availability Division) to go after the burgeoning copy data management market.

It would be good if NEC can take notice and turn their HydraStor “supertanker” towards the Copy Data Management market. That would be something special to savour.

P/S: NEC. Sorry about the title. I just couldn’t resist it 😉

SMB on steroids but CIFS lord isn’t pleased

I admit it!

I am one of the guilty parties who continues to use CIFS (Common Internet File System) to represent the Windows file sharing protocol. And a lot of vendors continue to use the “CIFS” word loosely without knowing that it was a something from a bygone era. One of my friends even pronounced it as “See Fist“, which sounded even funnier when he said it. (This is for you Adrian M!)

And we couldn’t be more wrong because we shouldn’t be using the CIFS word anymore. It is so 90’s man! And the tell-tale signs have already been there but most of us chose to ignore it with gusto. But a recent SNIA Webinar titled “SMB 3.0 – New opportunities for Windows Environment” aims to dispel our incompetence and change our CIFS-venture to the correct word – SMB (Server Message Block).

A selfie photo of Dennis Chapman, Senior Technical Director for Microsoft Solutions at NetApp from the SNIA webinar slides above, wants to inform all of us that … SMB History Continue reading

HDS HNAS kicks ass

I am dusting off the cobwebs of my blog. After almost 3 months of inactivity, (and trying to avoid the Social Guidelines Media of my present company), I have bolstered enough energy to start writing again. I am tired, and I am finishing off the previous engagements prior to joining HDS. But I am glad those are coming to an end, with the last job in Beijing next week.

So officially, I will be in HDS as of November 4, 2013 . And to get into my employer’s good books, I think I should start with something that HDS has proved many critics wrong. The notion that HDS is poor with NAS solutions has been dispelled with a recent benchmark report from SPECSfs, especially when it comes to NFS file performance. HDS has never been much of a big shouter about their HNAS, even back in the days of OEM with BlueArc. The gap period after the BlueArc acquisition was also, in my opinion, quiet unless it was the gestation period for this Kick-Ass announcement a couple of weeks ago. Here is one of the news circling in the web, from the ever trusty El-Reg.

HDS has never been big shouting like the guys, like EMC and NetApp, who have plenty of marketing dollars to spend. EMC Isilon and NetApp C-Mode have always touted their mighty SPECSfs numbers, usually with a high number of controllers or nodes behind the benchmarks. More often than not, many readers would probably focus more on the NFSops/sec figures rather than the number of heads required to generate the figures.

Unaware of this HDS announcement, I was already asking myself that question about NFSops/sec per SINGLE controller head. So, on September 26 2013, I did a table comparing some key participants of the SPECSfs2008_nfs.v3 and here is the table:

SPECSfs2008_nfs.v3-26-Sept-2013In the last columns of the 2 halves (which I have highlighted in Red), the NFSops/sec/single controller head numbers are shown. I hope that readers would view the performance numbers more objectively after reading this. Therefore, I let you make your own decisions but ultimately, they are what they are. One should not be over-mesmerized by the super million NFSops/sec until one looks under the hood. Secondly, one should also look at things more holistically such as $/NFSops/sec, $/ORT (overall response time), and $/GB/NFSops/managed and other relevant indicators of the systems sold.

But I do not want to take the thunder away from HDS’ HNAS platforms in this recent benchmark. In summary,

HDS SPECbench summaryTo reach a respectable number of 607,647 NFSops/sec with a sub-second response time is quite incredible. The ORT of 0.59 msecs should not be taken lightly because to eke just about a 0.1 msec is not easy. Therefore, reaching 0.5 millisecond is pretty awesome.

This is my first blog after 3 months. I am glad to be back and hopefully with the monkey off my back (I am referring to my outstanding engagements), I can concentrating on writing good stuff again. I know, I know … I still owe some people some entries. It’s great to be back 🙂

The big boys better be flash friendly

An interesting article came up in the news this week. The article, from the ever popular The Register, mentioned 3 up and rising storage stars, Nimble Storage, Tintri and Tegile, and their assault on a flash strategy “blind spot” of the big boys, notably EMC and NetApp.

I have known about Nimble Storage and Tintri for a couple of years now, and I did take some time to read up on their storage technology offering. Tegile is new to me when it appeared on my radar after SearchStorage.com announced as the Gold Winner of the enterprise storage category for 2012.

The Register article intriqued me because it implied that these traditional storage vendors such as EMC and NetApp are probably doing a “band-aid” when putting together their flash storage strategy. And typically, I see these strategic concepts introduced by these 2 vendors:

  1. Have a server-side cache strategy by putting a PCIe card on the hosting server
  2. Have a network-based all-flash caching area
  3. Have a PCIe-based flash card on the storage system
  4. Have solid state drives (SSDs) in its disk shelves enclosures

In (1), EMC has VFCache (the server side caching software has been renamed to XtremSW Cache and under repackaging with the Xtrem brand name) and NetApp has it FlashAccel solution. Previously, as I was informed, FlashAccel was using the FusionIO ioTurbine solution but just days ago, NetApp expanded the LSI Nytro WarpDrive into its FlashAccel solution as well. The main objective of a server-side caching strategy using flash is to accelerate mostly read-based I/O operations for specific application workloads at the server side.

Continue reading

Is there no one to challenge EMC?

It’s been a busy, busy month for me.

And when the IDC Worldwide Quarterly Disk Storage Systems Tracker for 3Q12 came out last week, I was reading in awe how impressive EMC was at the figures that came out. But most impressive of all is how the storage market continue to grow despite very challenging and uncertain business conditions. With the Eurozone crisis, China experiencing lower economic growth numbers and the uncertainty in the US economic sectors, it is unbelievable that the storage market grew 24.4% y-o-y. And for the first time, 7,104PB was shipped! Yes folks, more than 7 exabytes was shipped during that period!

In the Top 5 external disk storage market based on revenue, only EMC and HDS recorded respectable growth, recording 8.7% and 13.8% respectively. NetApp, my “little engine that could” seems to be running out of steam, earning only 0.9% growth. The rest of the field, IBM and HP, recorded negative growth. Here’s a look at the Top 5 and the rest of the pack:

HP -11% decline is shocking to me, and given the woes after woes that HP has been experiencing, HP has not seen the bottom yet. Let’s hope that the new slew of HP storage products and technologies announced at HP Discover 2012 will lift them up. It also looked like a total rebranding of the HP storage products as well, with a big play on the word “Store”. They have names like StoreOnce, StoreServ, StoreAll, StoreVirtual, StoreEasy and perhaps more coming.

The Open SAN market, which includes iSCSI has EMC again at Number 1, with 29.8%, followed by IBM (14%), HDS (12.2%) and HP (11.8%). When combined with NAS numbers, the NAS + Open SAN market, EMC has 33.5% while NetApp is 13.7%.

Of course, it is just not about external storage because the direct-attached storage numbers count too. With that, the server vendors of IBM, HP and Dell are still placed behind EMC. Here’s a look at that table from IDC:

There’s a highlight of Dell in the table above. Dell actually grew by 4.0% compared to decline in HP and IBM, gaining 0.1%. However, their numbers seem too tepid and led to the exit of Darren Thomas, Dell’s storage group head honco. News of Darren’s exit was on TheRegister.

I also want to note that NAS growth numbers actually outpaced Open SAN numbers including iSCSI.

This leads me to say that there is a dire need for NAS technical and technology expertise in the local storage market. As the adoption of NFSv4 under way and SMB 2.0 and 3.0 coming into the picture, I urge all storage networking professionals who are more pro-SAN to step out of their comfort zone and look into NAS as well. The world is changing and it is no longer SAN vs NAS anymore. And NFSv4.1 is blurring the lines even more with the concepts of layout.

But back to the subject to storage market, is there no one out there challenging EMC in a big way? NetApp was, some years ago, recorded double digit growth and challenging EMC neck-and-neck, but that mantle seems to be taken over by HDS. But both are long way to go to get close to EMC.

Kudos to the EMC team for damn good execution!

NFS-phobic in Malaysia

I taught the EMC Cloud Infrastructure and Services (CIS) class last week and naturally, a few students came from the VMware space. I asked how they were implementing their storage and everyone said Fibre Channel.

I have spoken to a lot of people about this as well in the past, whether they are using SAN or NAS storage for VMware environments. And almost 99% would say SAN, either FC-SAN or iSCSI-based SAN. Why???

When I ask these people about deploying NFS, the usual reply would be related to performance.

NFS version 3 won the file sharing protocol race during its early days where Unix variants were prevalent, but no thanks to the Balkanization of Unices in the 90s. Furthermore, NFS lost quite a bit of ground between NFSv3 in 1995 and the coming out party of NFSv4.1 just 2 years ago. The in-between years were barren and NFS become quite a bit of a joke with “Need For Speed” or “No F*king Security“. That also could be a contributing factor to the NFS-phobia we see here in Malaysia.

I have experiences with both SAN and NAS and understood the respective protocols of Fibre Channel, iSCSI, NFS and CIFS, and I felt that NFS has been given unfair treatment by people in this country. For the uninformed, NFS is the only NAS protocol supported by VMware. CIFS, the Windows file sharing protocol, is not supported, probably for performance and latency reasons. However, if you catch up with high performance computing (HPC), clustering, or MPP (Massively Parallel Processing) resources, almost always you will read about NFS being involved in delivering very high performance I/O. So, why isn’t NFS proposed with confidence in VMware environments?

I have blogged about this before. And I want to use my blog today to reassert what I believe in and hope that more consideration can be given to NFS when it comes to performance, even for virtualized environments.

NFS performance is competitive when compared to Fibre Channel and in a lot of cases, better than iSCSI. It is just that the perception of poor performance in NFS is stuck in people’s mind and it is hard to change that. However, there are multiple credible sources that stated that NFS is comparable to Fibre Channel. Let me share with you one of the source that compared NFS with other transport protocols:

From the 2 graphs of IOPS and Latency, NFS fares well against other more popular transport protocols in VMware environments. Those NFS performance numbers, are probably not RDMA driven as well. Otherwise RDMA could very well boost the NFS numbers into even higher ground.

What is this RDMA (Remote Direct Memory Access)? RDMA is already making its presence felt quietly, and being used with transports like Infiniband and 10 Gigabit Ethernet. In fact, Oracle Solaris version 11 will use RDMA as the default transmission protocol whenever there is a presence of RDMA-enable NICs in the system. The diagram below shows where RDMA fits in in the network stack.

RDMA eliminates the need for the OS to participate in the delivery of data, and directly depositing the data from the initiator’s memory to the target’s memory. This eliminates traditional networking overheads such as buffers copying and setting up network data structures for the delivery. A little comparison of RDMA with traditional networking is shown below:

I was trying to find out how prevalent NFS was in supporting the fastest supercomputers in the world from the Top500 Supercomputing sites. I did not find details of NFS being used, but what I found was the Top500 supercomputers do not employ Fibre Channel SAN at all!  Most have either proprietary interconnects with some on Infiniband and 10 Gigabit Ethernet. I would presume that NFS would figure in most of them, and I am confident that NFS can be a protocol of choice for high performance environments, and even VMware environments.

The future looks bright for NFSv4. We are beginning to see the word of “parallel NFS (pNFS)” being thrown into conversations around here, and the awareness is there. NFS version 4.2 is just around the corner as well, promising greater enhancement to the protocol.