We are all familiar with the concept of data archiving. Passive data gets archived from production storage and are migrated to a slower and often, cheaper storage medium such tapes or SATA disks. Hence the terms nearline and offline data are created. With that, IT constantly reminds users that the archived data is infrequently accessed, and therefore, they have to accept the slower access to passive, archived data.
The business conditions have certainly changed, because the need for data to be 100% online is becoming more relevant. The new competitive nature of businesses dictates that data must be at the fingertips, because speed and agility are the new competitive advantage. Often the total amount of data, production and archived data, is into hundred of TBs, even into PetaBytes!
The industries I am familiar with – Oil & Gas, and Media & Entertainment – are facing this situation. These industries have a deluge of files, and unstructured data in its archive, and much of it dormant, inactive and sitting on old tapes of a bygone era. Yet, these files and unstructured data have the most potential to be explored, mined and analyzed to realize its value to the organization. In short, the archived data and files must be democratized!
The flip side is, when the archived files and unstructured data are coupled with a slow access interface or unreliable storage infrastructure, the value of archived data is downgraded because of the aggravated interaction between access and applications and business requirements. How would organizations value archived data more if the access path to the archived data is so damn hard???!!!
An interesting solution fell upon my lap some months ago, and putting A and B together (A + B), I believe the access path to archived data can be unbelievably of high performance, simple, transparent and most importantly, remove the BLOODY PAIN of FILE AND DATA MIGRATION! For storage administrators and engineers familiar with data migration, especially if the size of the migration is into hundreds of TBs or even PBs, you know what I mean!
I have known this solution for some time now, because I have been avidly following its development after its founders left NetApp following their Spinnaker venture to start Avere Systems.
Avere Systems pioneered the Edge-Core architecture for NAS, extending highly accelerated NFS and CIFS/SMB file access performance across local and remote networks. Its technology was able to achieve more than 1.5 million NFSops/sec when it beat the SPECsfs2008 record in 2011. I blogged about Avere’s incredible feat back in 2011.
Since then, they have of course, extended their capabilities into the cloud, with its CloudNAS solution and I was fortunate enough to get close to their technology when I was at Storage Field Day 6 back in November last year.
- Why is Avere’s technology capable of accelerating file and unstructured data access via NAS?
- How does it work with slower file access to archived data?
- How can it immensely reduce the pains of file migration?
Let me explain the A + B part.
Most voluminous data archived in the industries I mentioned are residing on tapes. When tapes are offline, the data on the tapes are useless to its present value. The pains of searching for the files or unstructured data from the tapes are legendary. The pain of putting the right tapes online to retrieve the files and data is another nightmare. So, many organizations prefer them to be online all the time. But there is the cost factor.
Some archived files and data are on nearline storage, but it is bloody painful to migrate passive files and data from production storage to nearline storage. Imagine having to migrate and move 1-2PB over the 10 or 20 Gigabit network. That could take weeks or even months! And the success rate of migration isn’t really that great and if the migration fails, you start again. There are, of course, specialized file migration software and appliances to do the job but technically, it is still moving a supertanker across the narrow river.
File and data migration cause these very real issues and it can be very costly:
- Acquiring and commissioning of the secondary storage platform
- Commissioning of a separate high throughput network for maximum data migration speed
- Success rate and timeline of file and data migration are not deterministic
- Keeping files access online and consistent during the migration is difficult. Many downtimes might be required
- Project, schedule and resources planning are challenging
- No real fallback plan. Either the transfer must complete or it will not.
- Spillover effects of lost time, productivity, lost files that could cause spiral effects on IT operations and administration
After the migration has completed, organizations will discover that
- The value of the files and unstructured data decreases because access to them is slow and painful
- Remote offices cannot really access them via VPNs of slower network denomination
- In a total organization’s data landscape, they are primarily data silos
- IT still has to do manual and less automated data and file provisioning to archived data
- The archived data still cannot be truly mined and analyzed as demands of data analytics rise
- Probably more that I cannot think of right now …
This is where we have to think-out-of-the-box. Many organizations are familiar putting the archived data storage layer behind the production storage layer. Why not put the Avere Systems’ technology layer IN FRONT of the production storage layer?
The diagram below describes a high level overview of Avere’s solution landscape:
It makes a lot of sense. Here are a few obvious reasons:
- We eliminate immediate data and file migration to secondary storage. The file migration can be transparently moved to a cheaper NAS alternative with its FlashMove feature in the future. (Note: For those who have tried NFS Automounter or Microsoft Dfs, you know they don’t scale and lack the HA capability)
- Delaying immediate file migration saves money, time and resources.
- High availability of file access via NFS and CIFS/SMB between EMC Isilon, NetApp filers and HDS HNAS. This is achieved with Avere’s FlashMirror feature.
- Super fast NFS acceleration. (Sorry, I don’t have the CIFS/SMB numbers but I presume that it will be super fast as well). Access to “archived” data is not slow anymore
- Files are available, online all the time from anywhere in the organization – even remote offices with crappy network lines
- Single NAS File network architecture for the entire organization- silos can be reduce or eliminated for NAS
- Reduced costs with better and fast file access to all data – active and passive
- Reduced time and higher productivity for files access – no need to find data in tapes, load them and procure them from difficult storage infrastructures or admins!
- Ready for the Cloud – CloudNAS integration with Object-based cloud storage infrastructure such as Amazon S3, Amplidata and CleverSafe
- Probably more that I cannot think of right now …
The data and file archiving thingy has been appearing in my radar for many years now, especially in my encounters with Oil & Gas companies in the region. And when it finally dawned on me that I can put Data Archiving (A) and this Avere technology (B) together, the whole picture suddenly became very clear.
What is Avere’s technology all about? It is a very clever data placement technology.
Footnote: I hope to write deeper into their technology and architecture in the future. I have done my deep-dive research, reading and understanding of their technology but then …. I hope my laziness doesn’t stop me 😉
Disclaimer: I am not affiliated to Avere Systems or its partners at this moment. This data archiving dilemma in Oil & Gas and Media & Entertainment industry has been playing in my mind for a long time and Avere is merely a solution to this requirement. For tape-lovers, there is also another technology I have in my mind right now. Contact me to know more.