Captain Dynamo Storage System

My research on file systems brought me to an very interesting piece of article. It is titled “Dynamo: Amazon’s Highly Available Key-Value Store” dated 2007.

Yes, this is an internal storage systems designed and developed in Amazon to scale and support Amazon Web Services (AWS). It is a very complex piece of technology and the paper is highly technical (not for the faint of heart). And of all places, Amazon is probably the last place you think you would find such smart technology, but it’s true. AWS engineers are slowly revealing the many of their innovations (think Amazon Silk browser technology).

And it appears that many of the latest cloud-based computing and services companies such as Amazon, Google and many others have been developing new methods of storing data objects. These methods are very different from the traditional methods of storing data, and many are no longer adopting the relational database model (RDBMS) to scale their business.

The traditional 3-tier architecture often adopted by web-based (before the advent of “cloud”), is evolving. As shown in the diagram below:

the foundation tier is usually a relational database (or a distributed relational database), communicating with the back-end storage (usually a SAN).

All that is changing because the relational database model is not keeping up with the tremendous pace of the proliferation of web-based and cloud-based objects or unstructured data. As explained by Alex Iskold, a writer of ReadWriteWeb, there are scalability issues with the conventional relational database.

 

Before I get to the scalability issues mentioned in the above diagram, let me set the floor for discussion.

For theoretical schoolers of relational database, the term ACID defines and guarantees the transactional reliability of relational databases. ACID stands for Atomicity, Consistency, Isolation and Durability. According to Wikipedia, “transactions provide an “all-or-nothing” proposition, stating that each work-unit performed in a database must either complete in its entirety or have no effect whatsoever. Further, the system must isolate each transaction from other transactions, results must conform to existing constraints in the database, and transactions that complete successfully must get written to durable storage.”

ACID has been the cornerstone of relational database from the very beginning. But as the demands of greater scalability and greater distribution of data, all 4 components of ACID – Atomicity, Consistency, Isolation, Durability – can no longer hold true. Hence, the CAP Theorem.

CAP Theorem (aka Brewer’s Theorem) stands for Consistency, Availability and Partition Tolerance. In the ACM (Association of Computing Machinery) conference in 2000, Eric Brewer of University of California, Berkeley delivered the theorem. It states that it is impossible for a distributed computer system (or a database system) to simultaneously guarantee all 3 components – Consistency, Availability and Partition Tolerance.

Therefore, as the database systems become more and more distributed in cyberspace, the ACID theorem begins to break down. All 4 components of ACID cannot be guaranteed simultaneously anymore as the database systems begin to become more and more distributed.

So when we get back to the diagram, both the concepts on left and right – Master/Slave OR Multiple Peers – will put a tremendous strain on the single, non-distributed relational database.

New data models are surfacing to handling the very distributed data sets. Distributed object-based  “file systems” and NoSQL type of databases are some of the unconventional data storage “systems” that are beginning to surface as viable alternatives to the relational database method in cyberspace. And one of them is the Amazon Dynamo Storage System. (ADSS)

ADSS is a highly available, Amazon-proprietary key-value distributed data store. ADSS has both the properties of distributed hash table and a database and it is used internally to power various Cloud Services in Amazon Web Services (AWS).

 

It behaves like a relational database where it stores data objects to be retrieved. However, the data objects are not stored in a table format of a conventional relational database. Instead, the data is stored in a distributed hash table and data content or value is retrieved with a key, hence a key-value data model.

The data content is stored and retrieved through a simple put and get interface, much like how RESTful would do it. From the article in ReadWriteWeb, here’s how Dynamo works:

  • Physical nodes are thought of as identical and organized into a ring.
  • Virtual nodes are created by the system and mapped onto physical nodes, so that hardware can be swapped for maintenance and failure.
  • The partitioning algorithm is one of the most complicated pieces of the system, it specifies which nodes will store a given object.
  • The partitioning mechanism automatically scales as nodes enter and leave the system.
  • Every object is asynchronously replicated to N nodes.
  • The updates to the system occur asynchronously and may result in multiple copies of the object in the system with slightly different states.
  • The discrepancies in the system are reconciled after a period of time, ensuring eventual consistency.
  • Any node in the system can be issued a put or get request for any key

The Dynamo architecture addresses the CAP Theorem well. It is highly available, where nodes, either physical or virtual,  can be easily swapped without affected the storage services. It is also high performance, nodes (again physical or virtual) can be added to boost the performance. The high performance and highly available components addresses the “A” piece of CAP.

Its distributed nature also allows it to scale to billions and billions of data objects and hence meets the “P” requirement of CAP. The Partitioning Tolerance is definitely there.

However, as stated by CAP Theorem, you can’t have all 3 happening at the same time. Therefore, the “C” or Consistency piece of CAP has to be compromised. That is why Dynamo has been labeled an “eventually consistency” storage system.

As data is stored into ADSS, the changes of the data is propogated and will be asynchronously replicated to other nodes in the system, eventually making all the data objects and its value consistent. However, given the speed of things in cyberspace and the nature of most Cloud Computing services, the consistency piece could be difficult to accomplish and that is OK because in most of the transactions that are distributed, inconsistency is acceptable.

So that’s a bit about the Amazon Dynamo. Alas, we may never get our grubby hands on this piece of cool data storage and management technology, but knowing that Dynamo is powering AWS and its business is an eye-opener for us into the realm of a new technology evolution.

Is there IOPS for Cloud Storage? – Nasuni style

I was in Singapore last week attending the Cloud Infrastructure Services course.

In the class, one of the foundation components of Cloud Computing is of course, storage. As the students and the instructor talked about Storage, one very interesting argument surfaced. It revolved around the storage, if it was offered on the cloud. A lot of people assumed that Cloud Storage would be for their databases, and their virtual machines, which of course, is true when the communication between the applications, virtual machines and databases are in the local area network of the Cloud Service Provider (CSP).

However, if the storage is offered through the cloud to applications that are sitting on-premise in the customer’s server room, then we have to think twice of how we perceive Cloud Storage. In this aspect, the Cloud Storage offered by the CSP is a Infrastructure-as-a-Service (IaaS), where the key service is Storage. We have to differentiate that this Storage functions as a data container, and usually not for I/O performance reasons.

Though this concept probably will be easily understood by storage professionals like us, this can cause a bit confusion for someone new to the concept of Cloud Computing and Cloud Storage. This confusion, unfortunately, is caused by many of us who are vendors or solution providers, or even publications and magazines. We are responsible to disseminate correct information to customers, but due to our lack of knowledge and experience in this extremely new market of Cloud Storage, we have created the FUDs (Fear, Uncertainty and Doubt) and hype.

Therefore, it is the duty of this blogger to clear the vapourware, and hopefully pass on the right information to accelerate  the adoption of Cloud Storage in the near future. At this moment, given the various factors such as network costs, high network latency and lack of key network technologies similar to LAN in Cloud Computing, Cloud Storage is, most of the time, for data storage containership and archiving only. And there are no IOPS or any performance related statistics related to Cloud Storage. If any engineer or vendor tells you that they have the fastest Cloud Storage in the industry, do me a favour. Give him/her a knock on the head for me!

Of course, as technologies evolve, this could change in the near future. For now, Cloud Storage is a container, NOT a high performance storage in the cloud. It is usually not meant for transactional data. There are many vendors in the Cloud Storage space from real CSPs to storage companies offering re-packaged storage boxes that are “cloud-ready”. A good example of a CSP offering Cloud Storage is Amazon S3 (Simple Storage Service). And storage vendors such as EMC and HDS are repackaging and rebranding their storage technologies as object storage, ready for the cloud. EMC Atmos is really a repackaged and rebranded Centera, with some slight modifications, while HDS , using their Archiving solution, has HCP (aka HCAP). There’s nothing wrong with what EMC and HDS have done, but before the overhyping of the world of Cloud Computing, these platforms were meant for immutable data archiving reasons. Just thought you should know.

One particular company that captured my imagination and addresses the storage performance portion is Nasuni. Of course, they are quite inventive with the Cloud Storage Gateway approach. Nasuni comes up with a Cloud Storage Gateway filer appliance, which can be either a physical 1U server or as a VMware or Hyper-V virtual appliance sitting on-premise at the customer’s site.

The key to this is “on-premise”, which allows access to data much faster because they are locally-cached in the Nasuni filer appliance itself. This Nasuni filer piece addresses the Cloud Storage “performance” piece but Nasuni do not claim any performance statistics with such implementation. The clever bit is that this addresses data or files that are transactional in nature, i.e. NFS or CIFS, to serve data or files “locally”. (I wonder if Nasuni filer has iSCSI as well. Hmmmm….)

In the Nasuni architecture, they “break up” their “Cloud Storage” into 2 pieces. Piece #1 sits on-premise, at the customer site, and acts as a bridge to the Piece #2, that is sitting in a Cloud Storage. From a simplified view, have a look at the diagram below:

 

 

Piece #1 is the component that handles some of the transactional traffic related to files. In a more technical diagram below, you can see that the Nasuni filer addresses the file sharing portion, using the local disks on the filer appliance as a local caching mechanism.

 

Furthermore, older file pieces are whiffed away to the any Cloud Storage using the Cloud Connector interface, hence giving the customer a sense that their storage capacity needs can be limitless if they want to (for a fee, of course). At the same time, the Nasuni filer support thin provisioning and snapshots. How cool is that!

The Cloud Storage piece (Piece #2) is used for the data container and archiving reasons. This component can be sitting and hosted at Amazon S3, Microsoft Azure, Rackspace Cloud Files, Nirvanix Storage Delivery Network and Iron Mountain Archive Services Platform.

The data communication and transfer between the Nasuni filer is secure, encrypted, deduplication and compressed, giving it the efficiency and security that most customers would be concerned about. The diagram below explains the dat communication and data transfer bit.

 

In this manner, the Nasuni filer can replace traditional NAS platforms and can potentially provide a much lower total cost of ownership (TCO) in the long run. Nasuni does not pretend to be a NAS replacement. To me, this concept is very inventive and could potentially change the way we perceive file sharing and file server, obscuring and blurring concept of NAS.

Again, I would like to reiterate that Nasuni does not attempt to say their solution is a NAS or a performance-based Cloud Storage but what they have cleverly packaged seems to be appealing to customers. Their customer base has grown 78% in Q2 of 2011. It’s just too bad they are not here in Malaysia or this part of the world (yet).

IOPS in Cloud Storage? Not yet.

 

A little yellow elephant

By now, I believe most of you in the storage networking world would have heard of Hadoop. Hadoop was created by Doug Cutting, while he and his team was working on an open source web search engine called Nutch. The easily recognized little yellow elephant, Hadoop, was Doug Cutting’s son toy, which he made as Hadoop’s mascot. Pretty cool!

And today, Hadoop has become THE platform for Big Data applications. Why?

As I have mentioned before, everything that we do or don’t do, generates data, either as a direct product or in-direct product. I am blogging right now and I am creating data. I was in Singapore the whole of this week and everywhere I go in the MRT stations, I am being watched by the video cameras they have at the station. A new friend in class said that Singapore is the second most “watched” city after London, where there are video cameras mounted everywhere, either discreetly or indiscreetly. And that’s just video data. And there’s plenty of other human activities that generate tons and tons of data.

IDC Digital Universe Report for 2011 said that we have generated 1.8ZB (zettabyte) of data this year alone. I mentioned in my previous blog that this is a gold mine and companies are scrambling to tap on massive amount of data.  Extracting valuable information to anticipate the next trend or predict that next evolution in human preference is akin to the Gold Rush in the wild, wild west in the late 19th century. Folks, Big Data is going to be this generation’s “Digital Gold Rush”.

Sieving, filtering and processing gazillions of data (more unstructured than structured) will not work in defined, well-formatted relational databases. The data model of relational databases will simply break down. And of course, there are different schools of thoughts of different data models, but the Hadoop model seems to be gaining momentum and mind share of data scientists. That is because of Hadoop’s capability to deal with massive unstructured data, processing it and producing results in a small amount of time.

One way to process the pool of massive data is parallel programming. In parallel programming, multi-threading is commonly deployed to achieve the performance and effects of programming. But implementing multi-threading in parallel programming is difficult. Developers often has to deal with LWP (lightweight processes), semaphores, shared memory, mutex (mutually exclusive) locking and so on. Hence this style of programming works with different states on shared data, often resulting in different results in different states, even when using the same programming expression.

Hadoop belongs to another school of programming known as functional programming, where the different states on shared data concept is removed. With that in mind, the dependency on different states is also removed, resulting in a much easier and simpler parallel programming implementation. Hadoop borrows ideas from the MapReduce software framework made well known by Google and the Google File System.

Before, we get to know Hadoop, we must know MapReduce. MapReduce is a framework which allows very large data sets to be processed with a very large set of computer nodes in a cluster. Typically the computational processing is executed in a distributed fashion, spread across many computer nodes and final results are consolidated from the sub-results of these distributed processing nodes.

According to Wikipedia, the 2 key functions of Map Reduce are map() and reduce(). That’s pretty obvious. The extract below was taken from the Wikipedia definition, and explains both functions very well.

“Map” step: The master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.

“Reduce” step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.

The diagram below probably can simplify the concept of MapReduce to the readers.

 

Hadoop is one of the open-source implementations of MapReduce. It is one of the projects of Apache Foundation, and the project has sparked a brand-new niche of data search, data management and data science. The diagram below will allow our readers to juxtapose MapReduce and Hadoop, and comparing them in the simplest fashion.

Hadoop primary development platform is Java. Hadoop’s architecture consists mainly of 2 components – Hadoop Common and a Hadoop-compatible file system, as shown in the diagram below.

Hadoop MapReduce layer above is the file/object access interface to the Hadoop-compatible file system below. HDFS is Hadoop Distributed File System is just one of a few Hadoop-compatible file systems. Other file systems include:

  • Amazon S3 File System as part of the Amazon EC2 Infrastructure-as-a-Service (IaaS) cloud platform
  • CloudStore – a similar Hadoop-like implementation using C++ and also inspired by Google File System
  • FTP file systems
  • HTTP and HTTPS read-only file systems
  • Any file systems accessible with the file:// URL nomenclature

But the main engine of Hadoop is in the MapReduce layer. The 2 core components in this layer is JobTracker and TaskTracker. Both has their own individual roles to play and collectively, they are key cogs in the Hadoop distributed data processing model.

Below are extract I picked up from Wikipedia.

JobTracker submits MapReduce jobs to client applications. The JobTracker pushes work out to available TaskTracker nodes in the cluster, striving to keep the work as close to the data as possible. With a rack-aware filesystem, the JobTracker knows which node contains the data, and which other machines are nearby. If the work cannot be hosted on the actual node where the data resides, priority is given to nodes in the same rack. This reduces network traffic on the main backbone network. If a TaskTracker fails or times out, that part of the job is rescheduled. The TaskTracker on each node spawns off a separate Java Virtual Machine process to prevent the TaskTracker itself from failing if the running job crashes the JVM. A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check its status. The Job Tracker and TaskTracker status and information is exposed by Jetty and can be viewed from a web browser. Jetty is a Java-based HTTP server, among other things

JobTracker records what it is up to in the filesystem. When a JobTracker starts up, it looks for any such data, so that it can restart work from where it left off.

Scheduling

By default Hadoop uses first-in, first-out (FIFO), and optional 5 scheduling priorities to schedule jobs from a work queue. In version 0.19 the job scheduler was refactored out of the JobTracker, and added the ability to use an alternate scheduler (such as the Fair scheduler or the Capacity scheduler).

Fair scheduler

The fair scheduler was developed by Facebook. The goal of the fair scheduler is to provide fast response times for small jobs and QoS (Quality of Service) for production jobs. The fair scheduler has three basic concepts.

  1. Jobs are grouped into Pools.
  2. Each pool is assigned a guaranteed minimum share.
  3. Excess capacity is split between jobs.

By default jobs that are uncategorized go into a default pool. Pools have to specify the minimum number of map slots, reduce slots, and a limit on the number of running jobs.

Capacity scheduler

The capacity scheduler was developed by Yahoo. The capacity scheduler supports several features which are similar to the fair scheduler.

  • Jobs are submitted into queues.
  • Queues are allocated a fraction of the total resource capacity.
  • Free resources are allocated to queues beyond their total capacity.
  • Within a queue a job with a high level of priority will have access to the queue’s resources.

I took most the extract below from Wikipedia, and I don’t claim to be a knowledgeable person on Hadoop. All the credits go to Wikipedia editors to put Hadoop in layman terms.

Hadoop has certainly won the hearts of the new digital gold rush, Big Data and is slowly becoming a force to be reckoned with among data scientists. Hadoop implementations are powering new frontiers in processing and mining the ever growing data capacity, giving solution providers a simple programming methodology and data model to gain more insights into the vast seas of data and information.

Hadoop has many fans, and slowly becoming the data platform for large companies such as Yahoo!, Facebook, IBM, Amazon, Apple, eBay and many more. Facebook even claims to have the largest Hadoop clusters in the world, growing to 30PB in July of 2011.

This little yellow elephant is going places and one to watch out for.

Storage must go on a diet

Nowadays, the capacity of the hard disk drives (HDDs) are really big. 3TB is out and 4TB is in the horizon. What’s next?

For small-medium businesses in Malaysia, depending on their data requirements and applications, 3-10TB is pretty sufficient  and with room to grow as well. Therefore, a 6TB requirement can be easily satisfied with 2 x 3TB HDDs.

If I were the customer, why would I buy a storage array, with the software licenses and other stuff that will not only increase my cost of equipment acquisition and data management, it will also increase the complexity of my IT infrastructure? I could just slot HDDs into my existing server, RAID it with RAID-0 (not a good idea but to save costs, most customers would do that) and I have a 6TB volume! It’s cheaper, easier to manage with Windows or Linux, and my system administrator doesn’t have to fuss about lack of storage experience.

And RAID isn’t really keeping up with the tremendous growth of HDD’s capacity as well. In fact, RAID is at risk. RAID (especially RAID 5/6) just cannot continue provide the LUN or volume reliability and data availability because it just takes too damn long to rebuild the volume after the failure of a disk.

Back in the days where HDDs were less than 500GB, RAID-5 would still hold up but after passing the 1TB mark, RAID-6 became more prevalent. But now, that 1TB has ballooned to 3TB and RAID-6 is on shaky ground. What’s next? RAID-7? ZFS has RAID-Z3, triple parity but come on, how many vendors have that? With triple parity or stronger RAID (is there one?), the price of the storage array is going to get too costly.

Experts have been speaking about parity-declustering,  but that’s something that a few vendors have right now. Panasas, founded by one of forefathers of RAID, Garth Gibson, comes to mind. In fact, Garth Gibson and Mark Holland of Cargenie-Mellon University’s Parallel Data Lab (PDL) presented a paper about parity-declustering more than 10 years ago.

Let’s get back to our storage fatty. Yes, our storage is getting fat, obese, rotund or whatever you want to call it. And storage vendors have been pushing a concept in hope that storage administrators and customers can take advantage of it. It is called Storage Optimization or Storage Efficiency.

Here are a few ways you can consider to put your storage on a diet.

  • Compression
  • Thin Provisioning
  • Deduplication
  • Storage Tiering
  • Tapes and SSDs

To me, compression has not taken the storage world by storm. But then again, there aren’t many vendors that tout compression as a feature for storage optimization. Most of them rather prefer to push the darling of data reduction, data deduplication, as the main feature for save more space. Theoretically, data deduplication makes more sense when the data is inactive, and has high occurrence of duplicated data. That is why secondary storage such  as backup deduplication targets like Data Domain, HP StoreOnce, Quantum DXi can publish 20:1 rates and over time, that rate can get even higher.

NetApp also has been pushing their A-SIS data deduplication on primary storage. Yes, it helps with the storage savings in primary but when the need for higher data transfer rates and time to access “manipulated” data (deduped or compressed), it is likely that compression is a better choice for primary, active data.

So who has compression? NetApp ONTAP 8.0.1 has compression now and IBM with its Storewize V7000 started as a compression device. Read about IBM Storewize in my blog here. Dell has Ocarina Networks, which was recently unleashed. I am a big fan of Ocarina Networks and I wrote about the technology in my previous blog. EMC, during the Celerra days of DART has compression but I don’t hear much about it in their VNX. Compression is there, believe me, embedded all the loads of EMC marketing.

Thin Provisioning is now a must-have and standard feature of all storage vendors. What is Thin Provisioning? The diagram below shows you:

In the past, storage systems aren’t so intelligent. You ask for 10TB, you are given 10TB and that 10TB is “deducted” from the storage capacity. That leads to wastage and storage inefficiencies. Today, Thin Provisioning will give you 10TB but storage capacity is consumed as it is being used. The capacity is not pre-allocated as in the past. Thin provisioning is a great diet pill for bloated storage projects. 

Another up and coming feature is storage tiering. Storage tiering, when associated to storage optimization, should include hierarchical storage management (HSM) and tape-out as well. Storage optimization solutions should not offer only in the storage array itself. Storage tiering within the storage array is available with most vendors – IBM EasyTier, EMC FAST2, Dell Fluid Data Management and many others. But what about data being moved out of the storage array? What about reducing the capacity of the data online or near-line? Why not put them offline if there isn’t a need for it?

I term this as Active Archiving, something I learned while I was at EMC. Here’s a look at EMC’s style of Active Archiving:

Active Archiving promotes the concept of data archiving and is not unique only to EMC. Almost all storage vendors, either natively or with 3rd party vendors, can perform fairly efficient data archiving in one way or another. One of the software that I liked (and not unique!) is Quantum Stornext. Here’s a video of how Quantum Stornext helps reduce the fat of the storage.

With the single-copy sharing feature of Quantum Stornext to multiple disparate OSes, there are lesser duplicate files in storage as well.

Tapes have been getting a bad name in the past few years. It has been repositioned and repurposed as an archive medium rather than a backup medium. But tape is the greenest and most powerful storage diet pill around. And we should not be discount tapes because tapes are fighting back. Pretty soon you will be hearing about Linear Tape File System (LTFS). In a nutshell, Linear Tape File System (LTFS) allows you to use the tape almost as if it were a hard disk. You can drag and drop files from your server to the tape, see the list of saved files using a standard operating system directory (no backup software catalog needed), and use point and click to restore. How cool is that!

And Solid State Drives (SSDs) makes sense as well.

There are times that we need IOPS and using spinning drives, we have to set up many disk spindles to achieve the IOPS that we want.  For example, using the diagram below from the godfather of storage, Greg Schulz,

The set of 16 spinning HDD drives on the left can only deliver 3,520 IOPS. The problem is, we have wasted a lot of disk space, as seen in the diagram below. This design, which most customer would be accustomed to, may look cheaper but in actual fact, is NOT.

If the price of a Fibre Channel HDD is RM2,000, the total of 16 would make up RM32,000.00. That is not inclusive of additional power and cooling and rack space and also the data management costs. Assuming the SSDs costs 5 times more than the Fibre Channel HDD. SSDs are capable of delivering very high IOPS. Here I am putting a modest 5,000 IOPS per SSDs. With just 2 SSDs (as the right design suggests), the total costs is only RM20,000. It has greater performance room to grow, and also savings in data management, power and cooling.

Folks, consider SSDs as part of your storage diet plan.

All these features are available, in whole or in part, and they are part of the storage technology offerings that is out there. With all these being said, are you doing something about it? Get off your lazy bum and start managing your storage and put your storage on a diet!!!

Crisis? What crisis?

The storage train is still chugging hard and fast as IDC just released its Worldwide Disk Storage System Tracker for 3Q11. Despite the economic climate, the storage market posted a strong 8.5% revenue growth and a whopping 30.7% growth in terms of petabytes shipped. In total, 5,429PB were shipped in Q3.

So how did everyone do in this latest Tracker report?

In the Worldwide Total External Disk Storage Systems, EMC is still holding on to the #1 position, with 28.6%. IBM and NetApp came in at 12.7% and 12.1% respectively. The table below summarizes the percentage view of the top storage players, in terms of revenue.

 

From the table, everyone benefited from the strong buying of storage in the last quarter. EMC gained a strong market gain of almost 3%, while everyone else either gained or lost less than 1% market share.  But the more interesting numbers are not from the market share column but the % growth column.

HDS posted the strongest growth of 22.1%, slightly higher than EMC of 22.0%. HDS is beginning to get their story right, putting the right storage solutions in place, and has been strongly focused in their services offering as well. That’s simply great news for HDS because this is a company is not known for their marketing and advertising. The Japanese “culture” within HDS probably has taught it to be prudent but to see HDS growing faster than the big boys like IBM and HP is something their competitors should respect. I believe customers are beginning to see the true potential of HDS.

As for EMC, everyone labels them as the 800-pound gorilla but they have been very nimble and strong in the storage market for many quarters. This is due to the strong management team headed by Joe Tucci and his heir-in-waiting, Pat Gelsinger. Several of their acquisitions are doing well, with the likes of Isilon, Greenplum, Data Domain, and of course VMware. Even though VMware does not contribute the EMC revenue numbers, the very fact that EMC owns more than 80% of VMware has already given EMC a lot of credibility in the storage battlefield. They are certainly going great guns.

NetApp took a hit in the last quarter, when they missed the street revenue numbers last quarter. Their stock took a beating and there were rumours in the market that NetApp might acquire Commvault and Quantum to compete with EMC. EMC has been able to leverage the list of companies and acquired solutions very well, from data protection solutions like Networker and Avamar, deduplication solutions like Data Domain and Avamar, Documentum for content management and so on, while NetApp has been, for the longest time, prefer a more “loosely-coupled” approach with their partners for a more complete solution set.

Other interesting reports from IDC are the Open SAN/NAS market, the NAS market and the iSCSI market.

The Open SAN/NAS market combination, according to IDC goes like this:

EMC 31.3%
NetApp 14.4%

In the NAS only market, EMC and Isilon (under the one EMC umbrella) competes with NetApp and the table is like this:

EMC 46.7%
NetApp 30.7%

The iSCSI only market is led by Dell (EqualLogic and Compellent combined), followed by EMC and IBM. Here’s the summarized table:

Dell 30.3%
EMC 19.2%
IBM 14.0%

The strong growth is indeed good news as the storage market continues to weather the economic crisis storm. I have been saying this all along. The storage market in IT is still the growth engine as data keeps growing and growing, even though it was never the darling of the IT industry. Let’s hope the trend continues.

Betcha don’t encrypt your disks

At the Internet Alliance event this morning, someone from Computerworld gave me a copy of their latest issue. The headline was “Security Incidents Soar”, with the details of the half-year review by CyberSecurity Malaysia.

Typically, the usual incidents list evolve around spam, intrusions, frauds, viruses and so on. However, storage always seems to be missing. As I see it, storage security doesn’t sit well with the security guys. In fact, storage is never the sexy thing and it is usually the IPS, IDS, anti-virus and firewall that get the highlights. So, when we talk about storage security, there is so little to talk about. In fact, in my almost 20-years of experience, storage security was only brought up ONCE!

In security, the most valuable piece of asset is data and no matter where the data goes, it always lands on …. STORAGE! That is why storage security could be one of the most overlooked piece in security. Fortunately, SNIA already has this covered. In SNIA’s Solid State Storage Initiative (SSSI), one aspect that was worked on was Self Encrypted Drives (SED).

SED is not new. As early as 2007, Seagate already marketed encrypted hard disk drives. In 2009, Seagate introduced enterprise-level encrypted hard disk drives. And not surprisingly, other manufacturers followed. Today, Hitachi, Toshiba, Samsung, and Western Digital have encrypted hard disk drives.

But there were prohibitive factors that dampened the adoption of self-encrypted drives. First of all, it was the costs. It was expensive a few years ago. There was (and still is) a lack of knowledge between the hardware of Self Encrypted Drives (SED) and software-based encryption. As the SED were manufactured, some had proprietary implementations that did not do their part to promote the adoption of SEDs.

As data travels from one infrastructure to another, data encryption can be implemented at different points. As the diagram below shows,

 

encryption can be put in place at the software level, the OS level, at the HBA, the network itself. It can also happen at the switch (network or fabric), at the storage array controller or at the hard disk level.

EMC multipathing software, PowerPath, has an encryption facility to ensure that data is encryption on its way from the HBA to the EMC CLARiiON storage controllers.

The “bump-in-the-wire” appliance is a bridge device that helps in composing encryption to the data before it reaches the storage. Recall that NetApp had a FIPS 140 certified product called Decru DataFort, which basically encrypted NAS and SAN traffic en-route to the NetApp FAS storage array.

And according to SNIA SSSI member, Tom Coughlin, SED makes more sense that software-based security. How does SED work?

First of all, SED works with 2 main keys:

  • Authentication Key (AK)
  • Drive Encryption Key (DEK)

The DEK is the most important component, because it is a symmetric key that encrypts and decrypts data on the HDDs or SSDs. This DEK is not for any Tom (sorry Tom), Dick and Harry. In order to gain access to DEK, one has to be authenticated and the authentication is completed by having the right authentication key (AK). Usually the AK is based on a 128/26-bit AES or DES and DEK is of a higher bit range. The diagram below shows the AK and DEK in action:

Because SED occurs at the drive level, it is significantly simpler to implement, with lower costs as well. For software-based encryption, one has to set up some form of security architecture. IPSec comes to mind. This is not only more complex, but also more costly to implement as well. Since it is software, the degree of security compromise is higher, meaning, the security model is less secure when compared to SED. The DEK of the SED does not leave the array, and if the DEK is implemented within the disk enclosure or the security module of SoC (System-on-Chip), this makes even more secure that software-based encryption. Also, the DEK is away from the CPU and memory, thus removing these components as a potential attack vendor that could compromise the data on the disks drives.

Furthermore, software-based encryption takes up CPU cycles, thus slows down the overall performance. In the Tom Coughlin study, based on both SSDs and HDDs, the performance of SED outperforms software-based encryption every time. Here’s a table from that study:

Another security concern is about data erasure. According to an old IBM study, about 90% of the retired HDDs still has data that is readable. That means that data erasure techniques used are either not implemented properly or simply not good enough. For us in the storage industry, an effective but time consuming technique is to overwrite the entire disks with 1s and reusing it. But to hackers, there are ways to “undelete” these bits and make the data readable again.

SED provides crypto erasure that is both effective and very quick. Since the data encryption key (DEK) was used to encrypt and decrypt data, the DEK can be changed and renewed in split seconds, making the content of the disk drive unreadable. The diagram below shows how crypto erasure works:

Data security is already at its highest alert and SEDs are going to be a key component in the IT infrastructure. The open and common standards are coming together, thanks to efforts to many bodies including SNIA. At the same time, product certifications are coming up and more importantly, the price of SED has come to the level that it is almost on par with normal, non-encrypted drives.

Hackers and data thieves are getting smarter all the time and yet, the security of the most important place of where the data rest is the least considered. SNIA and other bodies hope to create more awareness and seek greater adoption of self encrypted drives. We hope you will help spread the word too. Betcha thinking twice now about encrypting your data  on your disk drives now.

The recipe for storage performance modeling

Good morning, afternoon, evening, Ladies & Gentlemen, wherever you are.

Today, we are going to learn how to bake, errr … I mean, make a storage performance model. Before we begin, allow me to set the stage.

Don’t you just hate it when you are asked to do storage performance sizing and you don’t have a freaking idea how to get started? A typical techie would probably say, “Aiya, just use the capacity lah!”, and usually, they will proceed to size the storage according to capacity. In fact, sizing by capacity is the worst way to do storage performance modeling.

Bear in mind that storage is not a black box, although some people wished it was. It is not black magic when it comes to performance sizing because things can be applied in a very scientific and logical manner.

SNIA (Storage Networking Industry Association) has made a storage performance modeling methodology (that’s quite a mouthful), and basically simplified it into these few key ingredients. This recipe is for storage performance modeling in general and I am advising you guys out there to engage your storage vendors professional services. They will know their storage solutions best.

And I am going to say to you – Don’t be cheap and not engage professional services – to get to the experts out there. I was having a chat with an consultant just now at McDonald’s. I have known this friend of mine for about 6-7 years now and his name is Sugen Sumoo, the Director of DBORA Consulting. They specialize in Oracle and database performance tuning and performance forecasting and it is something that a typical DBA can’t do, because DBORA Consulting is the Professional Service that brings expertise and value to Oracle customers. Likewise, you have to engage your respective storage professional services as well.

In a cook book or a cooking show, you are presented with the ingredients used and in this recipe for storage performance modeling, the ingredients (in no particular order) are:

  • Application block size
  • Read and Write ratio
  • Application access patterns
  • Working set size
  • IOPS or throughput
  • Demand intensity

Application Block Size

First of all, the storage is there to serve applications. We always have to look from the applications’ point of view, not storage’s point of view.  Different applications have different block size. Databases typically range from 8K-64K and backup applications usually deal with larger block sizes. Video applications can have 256K block sizes or higher. It all depends.

The best way is to find out from the DBA, email administrator or application developers. The unfortunate thing is most so-called technical people or administrators in Malaysia doesn’t have a clue about the applications they manage. So, my advice to you storage professionals, do your research on the application and take the default value. These clueless fellas are likely to take the default.

Read and Write ratio

Applications behave differently at different times of the day, and at different times of the month (no, it’s not PMS). At the end of the financial year or calendar, there are some tasks that these applications do as well. But in a typical day, there are different weightage or percentage of read operations versus write operations.

Most OLTP (online transaction processing)-based applications tend to be read heavy and write light, but we need to find out the ratio. Typically, it can be a 2:1 ratio or 60%:40%, but it is best to speak to the application administrators about the ratio. DSS (Decision Support Systems) and data warehousing applications could have much higher reads than writes while a seismic-analysis applications can have multiple writes during the analysis periods. It all depends.

To counter the “clueless” administrators, ask lots of questions. Find out the workflow of several key tasks and ask what that particular tasks do at different checkpoints of the application’s processing. If you are lazy (please don’t be lazy, because it degrades your value as a storage professional), use a rule of thumb.

Application access patterns

Applications behave differently in general. They can be sequential, like backup or video streaming. They can be random like emails, databases at certain times of the day, and so on. All these behavioral patterns affect how we design and size the disks in the storage.

Some RAID levels tend to work well with sequential access and others, with random access. It is not difficult to find out about the applications’ pattern and if you read more about the different RAID-levels in storage, you can easily identify the type of RAID levels suitable for each type of behavioral patterns.

Working set size

This variable is a bit more difficult to determine. This means that a chunk of the application has to be loaded into a working area, usually memory and cache memory, to be used and abused by the application users.

Unless someone is well versed with the applications, one would not be able to determine how much of the applications would be placed in memory and in cache memory. Typically, this can only be determined after the application has been running for some time.

The flexibility of having SSDs, especially the DRAM-type of SSDs, are very useful to ensure that there is sufficient “working space” for these applications.

IOPS or Throughput

According to SNIA model, for I/O less than 64K, IOPS should be used as a yardstick to do storage performance modeling. Anything larger, use throughput, in which MB/sec is the measurement unit.

The application guy would be able to tell you what kind of IOPS their application is expecting or what kind of throughput they want. Again, ask a lot of questions, because this will help you determine the type of disks and the kind of performance you give to the application guys.

If the application guy is clueless again, ask someone more senior or ask the vendor. If the vendor engineers cannot give you an answer, then they should not be working for the vendor.

Demand intensity

This part is usually overlooked when it comes to performance sizing. Demand intensity refers to how intense is the I/O requests. It could come from 1 channel or 1 part of the applications, or it could come from several parts of the applications in parallel. It is as if the storage is being ‘bombarded’ by applications and this is the part that is hard to determine as well.

In some applications, the degree of intensity or parallelism can be tuned and to find out, ask the application administrator or developer. If not, ask the vendor. Also do a lot of research on the application’s architecture.

And one last thing. What I have learned is to add buffers to the storage performance model. Typically I would add about 10-20% extra but you never know. As storage professionals, I would strongly encourage to engage professional services, because it is worthwhile, especially in the early stages of the sizing. It is usually a more expensive affair to size it after the applications have been installed and running.

“Failure to plan is planning to fail”.  The recipe isn’t that difficult. Go figure it out.

Go on and be a storage extraordinaire

ex·tra·or·di·naire – Outstanding or remarkable in a particular capacity

I was plucking the Internet after dinner while I am holidaying right now in Port Dickson. And at about this time, the news from my subscriptions will arrive, perfectly timed as my food is digesting.

And in the news – “IDC Says Cloud Adoption Fuels Storage Sales”. You think?

We are generating so much data in this present moment, that IDC is already saying that we are doubling our data every 2 years. That’s massive and a big part of it is being fueled our adoption to Cloud. It doesn’t matter if it is a public, private or hybrid cloud because the way we use IT has changed forever. It’s all too clear.

Amazon has a massive repository of contents; Google has been gobbling tons of data and statistics since its inception; Apple has made IT more human; and Facebook has changed the way we communicate. FastCompany magazine called Amazon, Apple, Google and Facebook the Big 4 and they will converge sooner or later into what the tornado chasers call a Perfect Storm. Every single effort that these 4 companies are doing now will inevitably meet at one point, where content, communication, computing, data, statistics all become the elements of the Perfect Storm. And the outcome of this has never been more clearer. As FastCompany quoted:

“All of our activity on these devices produces a wealth of data, which leads to the third big idea underpinning their vision. Data is like mother’s milk for Amazon, Apple, Facebook, and Google. Data not only fuels new and better advertising systems (which Google and Facebook depend on) but better insights into what you’d like to buy next (which Amazon and Apple want to know). Data also powers new inventions: Google’s voice-recognition system, its traffic maps, and its spell-checker are all based on large-scale, anonymous customer tracking. These three ideas feed one another in a continuous (and often virtuous) loop. Post-PC devices are intimately connected to individual users. Think of this: You have a family desktop computer, but you probably don’t have a family Kindle. E-books are tied to a single Amazon account and can be read by one person at a time. The same for phones and apps. For the Fab Four, this is a beautiful thing because it means that everything done on your phone, tablet, or e-reader can be associated with you. Your likes, dislikes, and preferences feed new products and creative ways to market them to you. Collectively, the Fab Four have all registered credit-card info on a vast cross-section of Americans. They collect payments (Apple through iTunes, Google with Checkout, Amazon with Amazon Payments, Facebook with in-house credits). Both Google and Amazon recently launched Groupon-like daily-deals services, and Facebook is pursuing deals through its check-in service (after publicly retreating from its own offers product).”

Cloud is changing the way we work, we play, we live and data is now the currency of humans in the developed and developing worlds. And that is good news for us storage professionals, because all the data has to eventually end up in a storage somewhere, somehow.

That is why there is a strong demand for storage networking professionals. Not just any storage professionals but the ones that have the right attitude to keep developing themselves, enhancing their skillset, knowledge and experience. The ones that can forsee that the future will worship them and label them as deities of the Cloud era.

So why are you guys take advantage of this? Well, don’t just sit there and be ordinary. Be a storage extraordinaire now! And for those guys who want to settle of being ordinary … too bad! I said this before – You could lose your job.

Happy school holidays!