Storage must go on a diet

Nowadays, the capacity of the hard disk drives (HDDs) are really big. 3TB is out and 4TB is in the horizon. What’s next?

For small-medium businesses in Malaysia, depending on their data requirements and applications, 3-10TB is pretty sufficient  and with room to grow as well. Therefore, a 6TB requirement can be easily satisfied with 2 x 3TB HDDs.

If I were the customer, why would I buy a storage array, with the software licenses and other stuff that will not only increase my cost of equipment acquisition and data management, it will also increase the complexity of my IT infrastructure? I could just slot HDDs into my existing server, RAID it with RAID-0 (not a good idea but to save costs, most customers would do that) and I have a 6TB volume! It’s cheaper, easier to manage with Windows or Linux, and my system administrator doesn’t have to fuss about lack of storage experience.

And RAID isn’t really keeping up with the tremendous growth of HDD’s capacity as well. In fact, RAID is at risk. RAID (especially RAID 5/6) just cannot continue provide the LUN or volume reliability and data availability because it just takes too damn long to rebuild the volume after the failure of a disk.

Back in the days where HDDs were less than 500GB, RAID-5 would still hold up but after passing the 1TB mark, RAID-6 became more prevalent. But now, that 1TB has ballooned to 3TB and RAID-6 is on shaky ground. What’s next? RAID-7? ZFS has RAID-Z3, triple parity but come on, how many vendors have that? With triple parity or stronger RAID (is there one?), the price of the storage array is going to get too costly.

Experts have been speaking about parity-declustering,  but that’s something that a few vendors have right now. Panasas, founded by one of forefathers of RAID, Garth Gibson, comes to mind. In fact, Garth Gibson and Mark Holland of Cargenie-Mellon University’s Parallel Data Lab (PDL) presented a paper about parity-declustering more than 10 years ago.

Let’s get back to our storage fatty. Yes, our storage is getting fat, obese, rotund or whatever you want to call it. And storage vendors have been pushing a concept in hope that storage administrators and customers can take advantage of it. It is called Storage Optimization or Storage Efficiency.

Here are a few ways you can consider to put your storage on a diet.

  • Compression
  • Thin Provisioning
  • Deduplication
  • Storage Tiering
  • Tapes and SSDs

To me, compression has not taken the storage world by storm. But then again, there aren’t many vendors that tout compression as a feature for storage optimization. Most of them rather prefer to push the darling of data reduction, data deduplication, as the main feature for save more space. Theoretically, data deduplication makes more sense when the data is inactive, and has high occurrence of duplicated data. That is why secondary storage such  as backup deduplication targets like Data Domain, HP StoreOnce, Quantum DXi can publish 20:1 rates and over time, that rate can get even higher.

NetApp also has been pushing their A-SIS data deduplication on primary storage. Yes, it helps with the storage savings in primary but when the need for higher data transfer rates and time to access “manipulated” data (deduped or compressed), it is likely that compression is a better choice for primary, active data.

So who has compression? NetApp ONTAP 8.0.1 has compression now and IBM with its Storewize V7000 started as a compression device. Read about IBM Storewize in my blog here. Dell has Ocarina Networks, which was recently unleashed. I am a big fan of Ocarina Networks and I wrote about the technology in my previous blog. EMC, during the Celerra days of DART has compression but I don’t hear much about it in their VNX. Compression is there, believe me, embedded all the loads of EMC marketing.

Thin Provisioning is now a must-have and standard feature of all storage vendors. What is Thin Provisioning? The diagram below shows you:

In the past, storage systems aren’t so intelligent. You ask for 10TB, you are given 10TB and that 10TB is “deducted” from the storage capacity. That leads to wastage and storage inefficiencies. Today, Thin Provisioning will give you 10TB but storage capacity is consumed as it is being used. The capacity is not pre-allocated as in the past. Thin provisioning is a great diet pill for bloated storage projects. 

Another up and coming feature is storage tiering. Storage tiering, when associated to storage optimization, should include hierarchical storage management (HSM) and tape-out as well. Storage optimization solutions should not offer only in the storage array itself. Storage tiering within the storage array is available with most vendors – IBM EasyTier, EMC FAST2, Dell Fluid Data Management and many others. But what about data being moved out of the storage array? What about reducing the capacity of the data online or near-line? Why not put them offline if there isn’t a need for it?

I term this as Active Archiving, something I learned while I was at EMC. Here’s a look at EMC’s style of Active Archiving:

Active Archiving promotes the concept of data archiving and is not unique only to EMC. Almost all storage vendors, either natively or with 3rd party vendors, can perform fairly efficient data archiving in one way or another. One of the software that I liked (and not unique!) is Quantum Stornext. Here’s a video of how Quantum Stornext helps reduce the fat of the storage.

With the single-copy sharing feature of Quantum Stornext to multiple disparate OSes, there are lesser duplicate files in storage as well.

Tapes have been getting a bad name in the past few years. It has been repositioned and repurposed as an archive medium rather than a backup medium. But tape is the greenest and most powerful storage diet pill around. And we should not be discount tapes because tapes are fighting back. Pretty soon you will be hearing about Linear Tape File System (LTFS). In a nutshell, Linear Tape File System (LTFS) allows you to use the tape almost as if it were a hard disk. You can drag and drop files from your server to the tape, see the list of saved files using a standard operating system directory (no backup software catalog needed), and use point and click to restore. How cool is that!

And Solid State Drives (SSDs) makes sense as well.

There are times that we need IOPS and using spinning drives, we have to set up many disk spindles to achieve the IOPS that we want.  For example, using the diagram below from the godfather of storage, Greg Schulz,

The set of 16 spinning HDD drives on the left can only deliver 3,520 IOPS. The problem is, we have wasted a lot of disk space, as seen in the diagram below. This design, which most customer would be accustomed to, may look cheaper but in actual fact, is NOT.

If the price of a Fibre Channel HDD is RM2,000, the total of 16 would make up RM32,000.00. That is not inclusive of additional power and cooling and rack space and also the data management costs. Assuming the SSDs costs 5 times more than the Fibre Channel HDD. SSDs are capable of delivering very high IOPS. Here I am putting a modest 5,000 IOPS per SSDs. With just 2 SSDs (as the right design suggests), the total costs is only RM20,000. It has greater performance room to grow, and also savings in data management, power and cooling.

Folks, consider SSDs as part of your storage diet plan.

All these features are available, in whole or in part, and they are part of the storage technology offerings that is out there. With all these being said, are you doing something about it? Get off your lazy bum and start managing your storage and put your storage on a diet!!!

Crisis? What crisis?

The storage train is still chugging hard and fast as IDC just released its Worldwide Disk Storage System Tracker for 3Q11. Despite the economic climate, the storage market posted a strong 8.5% revenue growth and a whopping 30.7% growth in terms of petabytes shipped. In total, 5,429PB were shipped in Q3.

So how did everyone do in this latest Tracker report?

In the Worldwide Total External Disk Storage Systems, EMC is still holding on to the #1 position, with 28.6%. IBM and NetApp came in at 12.7% and 12.1% respectively. The table below summarizes the percentage view of the top storage players, in terms of revenue.

 

From the table, everyone benefited from the strong buying of storage in the last quarter. EMC gained a strong market gain of almost 3%, while everyone else either gained or lost less than 1% market share.  But the more interesting numbers are not from the market share column but the % growth column.

HDS posted the strongest growth of 22.1%, slightly higher than EMC of 22.0%. HDS is beginning to get their story right, putting the right storage solutions in place, and has been strongly focused in their services offering as well. That’s simply great news for HDS because this is a company is not known for their marketing and advertising. The Japanese “culture” within HDS probably has taught it to be prudent but to see HDS growing faster than the big boys like IBM and HP is something their competitors should respect. I believe customers are beginning to see the true potential of HDS.

As for EMC, everyone labels them as the 800-pound gorilla but they have been very nimble and strong in the storage market for many quarters. This is due to the strong management team headed by Joe Tucci and his heir-in-waiting, Pat Gelsinger. Several of their acquisitions are doing well, with the likes of Isilon, Greenplum, Data Domain, and of course VMware. Even though VMware does not contribute the EMC revenue numbers, the very fact that EMC owns more than 80% of VMware has already given EMC a lot of credibility in the storage battlefield. They are certainly going great guns.

NetApp took a hit in the last quarter, when they missed the street revenue numbers last quarter. Their stock took a beating and there were rumours in the market that NetApp might acquire Commvault and Quantum to compete with EMC. EMC has been able to leverage the list of companies and acquired solutions very well, from data protection solutions like Networker and Avamar, deduplication solutions like Data Domain and Avamar, Documentum for content management and so on, while NetApp has been, for the longest time, prefer a more “loosely-coupled” approach with their partners for a more complete solution set.

Other interesting reports from IDC are the Open SAN/NAS market, the NAS market and the iSCSI market.

The Open SAN/NAS market combination, according to IDC goes like this:

EMC 31.3%
NetApp 14.4%

In the NAS only market, EMC and Isilon (under the one EMC umbrella) competes with NetApp and the table is like this:

EMC 46.7%
NetApp 30.7%

The iSCSI only market is led by Dell (EqualLogic and Compellent combined), followed by EMC and IBM. Here’s the summarized table:

Dell 30.3%
EMC 19.2%
IBM 14.0%

The strong growth is indeed good news as the storage market continues to weather the economic crisis storm. I have been saying this all along. The storage market in IT is still the growth engine as data keeps growing and growing, even though it was never the darling of the IT industry. Let’s hope the trend continues.

Betcha don’t encrypt your disks

At the Internet Alliance event this morning, someone from Computerworld gave me a copy of their latest issue. The headline was “Security Incidents Soar”, with the details of the half-year review by CyberSecurity Malaysia.

Typically, the usual incidents list evolve around spam, intrusions, frauds, viruses and so on. However, storage always seems to be missing. As I see it, storage security doesn’t sit well with the security guys. In fact, storage is never the sexy thing and it is usually the IPS, IDS, anti-virus and firewall that get the highlights. So, when we talk about storage security, there is so little to talk about. In fact, in my almost 20-years of experience, storage security was only brought up ONCE!

In security, the most valuable piece of asset is data and no matter where the data goes, it always lands on …. STORAGE! That is why storage security could be one of the most overlooked piece in security. Fortunately, SNIA already has this covered. In SNIA’s Solid State Storage Initiative (SSSI), one aspect that was worked on was Self Encrypted Drives (SED).

SED is not new. As early as 2007, Seagate already marketed encrypted hard disk drives. In 2009, Seagate introduced enterprise-level encrypted hard disk drives. And not surprisingly, other manufacturers followed. Today, Hitachi, Toshiba, Samsung, and Western Digital have encrypted hard disk drives.

But there were prohibitive factors that dampened the adoption of self-encrypted drives. First of all, it was the costs. It was expensive a few years ago. There was (and still is) a lack of knowledge between the hardware of Self Encrypted Drives (SED) and software-based encryption. As the SED were manufactured, some had proprietary implementations that did not do their part to promote the adoption of SEDs.

As data travels from one infrastructure to another, data encryption can be implemented at different points. As the diagram below shows,

 

encryption can be put in place at the software level, the OS level, at the HBA, the network itself. It can also happen at the switch (network or fabric), at the storage array controller or at the hard disk level.

EMC multipathing software, PowerPath, has an encryption facility to ensure that data is encryption on its way from the HBA to the EMC CLARiiON storage controllers.

The “bump-in-the-wire” appliance is a bridge device that helps in composing encryption to the data before it reaches the storage. Recall that NetApp had a FIPS 140 certified product called Decru DataFort, which basically encrypted NAS and SAN traffic en-route to the NetApp FAS storage array.

And according to SNIA SSSI member, Tom Coughlin, SED makes more sense that software-based security. How does SED work?

First of all, SED works with 2 main keys:

  • Authentication Key (AK)
  • Drive Encryption Key (DEK)

The DEK is the most important component, because it is a symmetric key that encrypts and decrypts data on the HDDs or SSDs. This DEK is not for any Tom (sorry Tom), Dick and Harry. In order to gain access to DEK, one has to be authenticated and the authentication is completed by having the right authentication key (AK). Usually the AK is based on a 128/26-bit AES or DES and DEK is of a higher bit range. The diagram below shows the AK and DEK in action:

Because SED occurs at the drive level, it is significantly simpler to implement, with lower costs as well. For software-based encryption, one has to set up some form of security architecture. IPSec comes to mind. This is not only more complex, but also more costly to implement as well. Since it is software, the degree of security compromise is higher, meaning, the security model is less secure when compared to SED. The DEK of the SED does not leave the array, and if the DEK is implemented within the disk enclosure or the security module of SoC (System-on-Chip), this makes even more secure that software-based encryption. Also, the DEK is away from the CPU and memory, thus removing these components as a potential attack vendor that could compromise the data on the disks drives.

Furthermore, software-based encryption takes up CPU cycles, thus slows down the overall performance. In the Tom Coughlin study, based on both SSDs and HDDs, the performance of SED outperforms software-based encryption every time. Here’s a table from that study:

Another security concern is about data erasure. According to an old IBM study, about 90% of the retired HDDs still has data that is readable. That means that data erasure techniques used are either not implemented properly or simply not good enough. For us in the storage industry, an effective but time consuming technique is to overwrite the entire disks with 1s and reusing it. But to hackers, there are ways to “undelete” these bits and make the data readable again.

SED provides crypto erasure that is both effective and very quick. Since the data encryption key (DEK) was used to encrypt and decrypt data, the DEK can be changed and renewed in split seconds, making the content of the disk drive unreadable. The diagram below shows how crypto erasure works:

Data security is already at its highest alert and SEDs are going to be a key component in the IT infrastructure. The open and common standards are coming together, thanks to efforts to many bodies including SNIA. At the same time, product certifications are coming up and more importantly, the price of SED has come to the level that it is almost on par with normal, non-encrypted drives.

Hackers and data thieves are getting smarter all the time and yet, the security of the most important place of where the data rest is the least considered. SNIA and other bodies hope to create more awareness and seek greater adoption of self encrypted drives. We hope you will help spread the word too. Betcha thinking twice now about encrypting your data  on your disk drives now.

The recipe for storage performance modeling

Good morning, afternoon, evening, Ladies & Gentlemen, wherever you are.

Today, we are going to learn how to bake, errr … I mean, make a storage performance model. Before we begin, allow me to set the stage.

Don’t you just hate it when you are asked to do storage performance sizing and you don’t have a freaking idea how to get started? A typical techie would probably say, “Aiya, just use the capacity lah!”, and usually, they will proceed to size the storage according to capacity. In fact, sizing by capacity is the worst way to do storage performance modeling.

Bear in mind that storage is not a black box, although some people wished it was. It is not black magic when it comes to performance sizing because things can be applied in a very scientific and logical manner.

SNIA (Storage Networking Industry Association) has made a storage performance modeling methodology (that’s quite a mouthful), and basically simplified it into these few key ingredients. This recipe is for storage performance modeling in general and I am advising you guys out there to engage your storage vendors professional services. They will know their storage solutions best.

And I am going to say to you – Don’t be cheap and not engage professional services – to get to the experts out there. I was having a chat with an consultant just now at McDonald’s. I have known this friend of mine for about 6-7 years now and his name is Sugen Sumoo, the Director of DBORA Consulting. They specialize in Oracle and database performance tuning and performance forecasting and it is something that a typical DBA can’t do, because DBORA Consulting is the Professional Service that brings expertise and value to Oracle customers. Likewise, you have to engage your respective storage professional services as well.

In a cook book or a cooking show, you are presented with the ingredients used and in this recipe for storage performance modeling, the ingredients (in no particular order) are:

  • Application block size
  • Read and Write ratio
  • Application access patterns
  • Working set size
  • IOPS or throughput
  • Demand intensity

Application Block Size

First of all, the storage is there to serve applications. We always have to look from the applications’ point of view, not storage’s point of view.  Different applications have different block size. Databases typically range from 8K-64K and backup applications usually deal with larger block sizes. Video applications can have 256K block sizes or higher. It all depends.

The best way is to find out from the DBA, email administrator or application developers. The unfortunate thing is most so-called technical people or administrators in Malaysia doesn’t have a clue about the applications they manage. So, my advice to you storage professionals, do your research on the application and take the default value. These clueless fellas are likely to take the default.

Read and Write ratio

Applications behave differently at different times of the day, and at different times of the month (no, it’s not PMS). At the end of the financial year or calendar, there are some tasks that these applications do as well. But in a typical day, there are different weightage or percentage of read operations versus write operations.

Most OLTP (online transaction processing)-based applications tend to be read heavy and write light, but we need to find out the ratio. Typically, it can be a 2:1 ratio or 60%:40%, but it is best to speak to the application administrators about the ratio. DSS (Decision Support Systems) and data warehousing applications could have much higher reads than writes while a seismic-analysis applications can have multiple writes during the analysis periods. It all depends.

To counter the “clueless” administrators, ask lots of questions. Find out the workflow of several key tasks and ask what that particular tasks do at different checkpoints of the application’s processing. If you are lazy (please don’t be lazy, because it degrades your value as a storage professional), use a rule of thumb.

Application access patterns

Applications behave differently in general. They can be sequential, like backup or video streaming. They can be random like emails, databases at certain times of the day, and so on. All these behavioral patterns affect how we design and size the disks in the storage.

Some RAID levels tend to work well with sequential access and others, with random access. It is not difficult to find out about the applications’ pattern and if you read more about the different RAID-levels in storage, you can easily identify the type of RAID levels suitable for each type of behavioral patterns.

Working set size

This variable is a bit more difficult to determine. This means that a chunk of the application has to be loaded into a working area, usually memory and cache memory, to be used and abused by the application users.

Unless someone is well versed with the applications, one would not be able to determine how much of the applications would be placed in memory and in cache memory. Typically, this can only be determined after the application has been running for some time.

The flexibility of having SSDs, especially the DRAM-type of SSDs, are very useful to ensure that there is sufficient “working space” for these applications.

IOPS or Throughput

According to SNIA model, for I/O less than 64K, IOPS should be used as a yardstick to do storage performance modeling. Anything larger, use throughput, in which MB/sec is the measurement unit.

The application guy would be able to tell you what kind of IOPS their application is expecting or what kind of throughput they want. Again, ask a lot of questions, because this will help you determine the type of disks and the kind of performance you give to the application guys.

If the application guy is clueless again, ask someone more senior or ask the vendor. If the vendor engineers cannot give you an answer, then they should not be working for the vendor.

Demand intensity

This part is usually overlooked when it comes to performance sizing. Demand intensity refers to how intense is the I/O requests. It could come from 1 channel or 1 part of the applications, or it could come from several parts of the applications in parallel. It is as if the storage is being ‘bombarded’ by applications and this is the part that is hard to determine as well.

In some applications, the degree of intensity or parallelism can be tuned and to find out, ask the application administrator or developer. If not, ask the vendor. Also do a lot of research on the application’s architecture.

And one last thing. What I have learned is to add buffers to the storage performance model. Typically I would add about 10-20% extra but you never know. As storage professionals, I would strongly encourage to engage professional services, because it is worthwhile, especially in the early stages of the sizing. It is usually a more expensive affair to size it after the applications have been installed and running.

“Failure to plan is planning to fail”.  The recipe isn’t that difficult. Go figure it out.

Go on and be a storage extraordinaire

ex·tra·or·di·naire – Outstanding or remarkable in a particular capacity

I was plucking the Internet after dinner while I am holidaying right now in Port Dickson. And at about this time, the news from my subscriptions will arrive, perfectly timed as my food is digesting.

And in the news – “IDC Says Cloud Adoption Fuels Storage Sales”. You think?

We are generating so much data in this present moment, that IDC is already saying that we are doubling our data every 2 years. That’s massive and a big part of it is being fueled our adoption to Cloud. It doesn’t matter if it is a public, private or hybrid cloud because the way we use IT has changed forever. It’s all too clear.

Amazon has a massive repository of contents; Google has been gobbling tons of data and statistics since its inception; Apple has made IT more human; and Facebook has changed the way we communicate. FastCompany magazine called Amazon, Apple, Google and Facebook the Big 4 and they will converge sooner or later into what the tornado chasers call a Perfect Storm. Every single effort that these 4 companies are doing now will inevitably meet at one point, where content, communication, computing, data, statistics all become the elements of the Perfect Storm. And the outcome of this has never been more clearer. As FastCompany quoted:

“All of our activity on these devices produces a wealth of data, which leads to the third big idea underpinning their vision. Data is like mother’s milk for Amazon, Apple, Facebook, and Google. Data not only fuels new and better advertising systems (which Google and Facebook depend on) but better insights into what you’d like to buy next (which Amazon and Apple want to know). Data also powers new inventions: Google’s voice-recognition system, its traffic maps, and its spell-checker are all based on large-scale, anonymous customer tracking. These three ideas feed one another in a continuous (and often virtuous) loop. Post-PC devices are intimately connected to individual users. Think of this: You have a family desktop computer, but you probably don’t have a family Kindle. E-books are tied to a single Amazon account and can be read by one person at a time. The same for phones and apps. For the Fab Four, this is a beautiful thing because it means that everything done on your phone, tablet, or e-reader can be associated with you. Your likes, dislikes, and preferences feed new products and creative ways to market them to you. Collectively, the Fab Four have all registered credit-card info on a vast cross-section of Americans. They collect payments (Apple through iTunes, Google with Checkout, Amazon with Amazon Payments, Facebook with in-house credits). Both Google and Amazon recently launched Groupon-like daily-deals services, and Facebook is pursuing deals through its check-in service (after publicly retreating from its own offers product).”

Cloud is changing the way we work, we play, we live and data is now the currency of humans in the developed and developing worlds. And that is good news for us storage professionals, because all the data has to eventually end up in a storage somewhere, somehow.

That is why there is a strong demand for storage networking professionals. Not just any storage professionals but the ones that have the right attitude to keep developing themselves, enhancing their skillset, knowledge and experience. The ones that can forsee that the future will worship them and label them as deities of the Cloud era.

So why are you guys take advantage of this? Well, don’t just sit there and be ordinary. Be a storage extraordinaire now! And for those guys who want to settle of being ordinary … too bad! I said this before – You could lose your job.

Happy school holidays!