EMC and NetApp gaining market share with the latest IDC figures

The IDC 2Q11 global disk storage systems report is out. The good news is data is still growing, and at a tremendous pace as well. Both revenue and capacity have raced ahead with double digit growth, with capacity growth reaching almost 50%.

And not surprisingly to me, EMC and NetApp have gained market share at the expense of HP, IBM and Dell. Here are a couple of statistics tables:

Both EMC and NetApp have recorded more than 25% revenue growth, taking 1st and joint-2nd place respectively. I have always been impressed by both companies.

For EMC, the 800lbs gorilla of the storage market, to be able to get a 26% revenue growth is a massive, massive endorsement of how well EMC execute. They are like a big oil tanker in the rough seas, with the ability to do a 90 degree turn at the blink of an eye. Kudos to Joe Tucci and Pat Gelsinger.

Netapp has always been my “little engine that could”. Their ability to take market share Q-on-Q, Yr-on-Yr is second to none and once again, they did not disappoint. Even with the change of the big man from Dan Warmenhoven to Tom Georgens did not manage a smudge in its armour. And with the purchase of LSI this year, NetApp will go from strength to strength, gaining market share at the other expense. I believe NetApp’s culture plays a big role in their ability and their success. The management has always been honest and frank and there’s a lot of respect of an individual’s ability to contribute. No wonder they are the #5 best company to work for in the US.

The big surprise for me here is Hitachi Data Systems, posting a 23.3% growth. That’s tremendous because HDS has never known to hit such high growth. Perhaps they have finally got the formula right. Their VSP and AMS range must be selling well but again, for HDS, it is a challenge running to 2 different cultural systems within their company. The Japanese team and the US team must be hitting synchronicity at last.

Dell, despite firing all cylinders with EqualLogic and Compellent, actually lost market share. Their partnership with EMC has come to an end and they have not converted their customers to the EqualLogic and Compellent boxes. The Compellent purchase is fairly new (Q1 of 2011) and this will take some time to sink in with their customer. Let’s see how they fare in the next IDC report.

In this table above, HP has always been king of the hill. Bundling their direct attached or internal storage with their servers, just like IBM, has given them an unfair advantage. But for the first time, EMC has outshipped HP, without the presence of DAS and internal storage (which EMC does not sell). Even with the purchase of 3PAR late last year, HP were not able to milk the best of what 3PAR can offer. And not to mention that HP also has LeftHand Networks which now renumbered as the P4000. On the other hand, this is a fantastic result to EMC.

Where’s IBM in all this? Rather anemic, sad to say, compared to EMC and NetApp. IBM’s figures were 1/2 of what EMC and NetApp are posting and this is not good. They don’t have the right weapons to compete. XIV is slowly taking over the mantel of DS8000 as their flagship storage, and their DS series putting up their usual numbers. But that’s not good enough because if you look at the IBM line up, their Shark is pretty much gone. XIV and Storwiz(e) are the only 2 storage platforms that IBM owns. Mind you, Storwiz(e) is not really a primary storage solution. It’s a compression engine. Both the DS-series and N-series actually belongs to LSI (which NetApp owns) and NetApp respectively. So, IBM lacks the IP for storage and in the long run, IBM must do something about it. They must either buy or innovate. They should have bought NetApp when they had the chance in 2002, but today NetApp is becoming an impossible meal to swallow.

We shall see how IBM turns out but if they continue to suffer from anemia, there’s going to be trouble down the road.

As for HP, what can I say? Their XP range is from HDS but with 3PAR in the picture, it looks like the marriage could be ending soon. EVA is an aging platform and they got to refresh it with stronger middle tier platforms. As for the low end of the range, MSA is also something unexciting and I secretly believe that LeftHand should have stepped up. But unfortunately, the HP sales have to be careful not to push MSA and LeftHand side-by-side, and not cannibalizing each other. HP definitely has a challenge in its hands and both 3PAR and LeftHand have been with them for more than 2 quarters. It’s time to execute because the IDC figures have already proved that they are slipping.

What next HP?

 

All SSDs storage array? There’s more than meets the eye at Pure Storage

Wow, after an entire week off with the holidays, I am back and excited about the many happenings in the storage world.

One of the more prominent news was the announcement of Pure Storage launching its enterprise storage array build entirely with flash-based solid state drives. In addition to that, there were other start-ups who were also offering SSDs storage arrays. The likes of Nimbus Data, Avere, Violin Memory Systems all made the news as well as the grand daddy of solid state storage arrays, Texas Memory Systems.

The first thing that came to my mind was, “Wow, this is great because this will push down the $/GB of SSDs closer to the range of $/GB for spinning disks”. But then skepticism crept in and I thought, “Do we really need an entire enterprise storage array of SSDs? That’s going to cost the world”.

At the same time, we in the storage industry knows that no piece of data are alike. They can be large, small, random, sequential, accessed frequently or infrequently and so on. It is obviously better to tier the storage, using SSDs for Tier 0, 10K/15K RPM spinning HDDs for Tier 1, SATA for Tier 2 and perhaps tape for the archive tier. I was already tempted to write my pessimism on Pure Storage when something interesting caught my attention.

Besides the usual marketing jive of sub-milliseconds, predictable latency, green messaging, global inline deduplication and compression and built-in data integrity into its Purity Operating Environment (POE), I was very surprised to find the team behind Pure Storage. Here’s their line-up

  • Scott Dietzen, CEO – starting from principal technologist of Transarc (sold to IBM), principal architect of Web Logic (sold to BEA Systems), CTO of BEA (sold to Oracle), CTO of Zimbra (sold to Yahoo! and then to VMware)
  • John “Coz” Colgrove, Founder & CTO – Veritas Fellow, CTO of Symantec Data Management group, principal architect of Veritas Volume Manager (VxVM) and Veritas File System (VxFS) and holder of 70 patents
  • John Hayes, Founder & Chief Architect – formerly of  Yahoo! office of Chief Technologist
  • Bob Wood, VP of Engineering – Formerly NetApp’s VP of File System Engineering,
  • Michael Cornwell, Director of Technology & Strategy – formerly the lead technologist of Sun Microsystems’ Sun Storage F5100 Flash Array and also Quantum’s storage architect for their storage telemetry, VTL and DXi solutions
  • Ko Yamamoto, VP of System Engineering – previously NetApp’s director of platform engineering, Quantum DXi director of hardware engineering, and also key contributor to 4-generations of Tandem NonStop technology

In addition to that, there are 3 key individual investors worth mentioning

  • Diane Green – Founder of VMware and former CEO
  • Dr. Mendel Rosenblum – Founder and former Chief Scientist and creator of VMware
  • Frank Slootman – formerly CEO of Data Domain (acquired by EMC)

All these industry big guns are flocking to Pure Storage for a reason and it looks to me that Pure Storage ain’t your ordinary, run-of-the-mill enterprise storage company. There’s definitely more than meet the eye.

On top of the enterprise storage array platform is Pure Storage’s Purity Operating Environment (POE). POE focuses on 3 key storage services which are

  • High Performance Data Reduction
  • Mission Critical Reliability
  • Predictable Sub-millisecond Performance

After going through the deep-dive videos by Pure Storage’s CTO, John Colgrove, they are very much banking the success of their solution around SSDs. Everything that they have done is based on SSDs.  For example, in order to achieve a larger capacity as well as a much cheaper $/GB, the data reduction techniques in global deduplication, high compression and also fine grained thin provision of 512 bytes are used. By trading off IOPS (which SSDs have plenty since they are several times faster than conventional spinning disks), a larger usable capacity is achieved.

In their RAID 3D, they also incorporated several high reliability techniques and data integrity algorithm that are specifically for SSDs. One note that was mentioned was that traditional RAID and especially the parity-based RAID levels were designed in the beginning to protect against an entire device failure. However, in SSDs, the failure does not necessarily occur in the entire device. Because of the way SSDs are built, the failure hotspots tend to happen at the much more granular bit level of the SSDs. The erase-then-write techniques that are inherent in NAND Flash SSDs causes the bit error rate (BER) of the SSD device to go up as the device ages. Therefore, it is more likely to get a read/write error from within the SSDs memory itself rather than having the entire SSD device failing. Pure Storage RAID 3D is meant to address such occurrences of bit errors.

I spoke a bit of storage tiering earlier in this article because every corporation employs storage tiering to be financially responsible. However, John Colgrove’s argument was why tier the storage when there’s plentiful of IOPS and the $/GB is comparable to spinning disks. That is true is when the $/GB of SSDs can match the $/GB of spinning disks. Factors we must also taken into account is the rack-space savings using the smaller profile disks of SSDs, the power-savings costs of SSDs versus conventional HDD-based enterprise storage arrays. In its entirety, there are strong indications that the $/GB of SSD-based systems to match or perhaps lower the $/GB of HDD-based systems. And since the IOPS requirement levels of present-day applications have not demanded super-high IOPS and multi-core processing is cheap, there’s plenty of head-room for Pure Storage and other similar enterprise storage array companies to grow.

The tides are changing for the storage industry and it is good to see a start-up like Pure Storage boldly coming forth to announce their backing for SSDs. It’s good for the consumer and good for the industry. But more importantly, they are driving innovations to rethink of how we build storage arrays. I am looking forward to more things to come.

Having fun with your storage vendor and get the information to fit your data center

I was on my way to Singapore yesterday. At the departure lounge, I just started reading “Data Center Storage” by Hubbert Smith (ISBN#: 978-1439834879) yesterday and I learned something very interesting immediately. Then my thoughts started stirring and I thought I have a bit of fun with what I have learned from the book.

The single, most significant piece of the storage solution is the hard disk drive (HDD). Regardless of SAN or NAS protocols, the data is stored and served from the hard disk drives. And there are 4 key metrics of a HDD, which are

  • Price
  • Performance
  • Capacity
  • Power

As storage professionals, we are often challenged to deliver the best storage solution to meet the customer’s requirements. Therefore, it is not about providing the fastest IOPS or the best availability or the lowest price. It is about providing the best balance of the 4 key metrics above.

The 4 metrics are of little help when they are standalone but if they are combined in relation to each other, you as a customer, can obtain some measurable ratios that will be useful to size for a requirements, keeping the balance of the 4 key metrics better defined rather than getting fluff and BS from the storage vendor.

In the book, the following table was displayed and I found it to be extremely useful:

Key Ratios for HDDs
Ratio
Performance/Price IOPS/$
Performance/Power IOPS/watt
Capacity/Price GB/$
Capacity/Power GB/watt

The relational ratios in red are going to be useful in determining the right type of storage for the requirement. And we will come back to this later. We begin our quest to obtain the information that we want – Performance, Capacity, Price, Power.

Capacity is the easy one because it is a given fact the size of the HDDs.

IOPS for each type of HDDs is also easy to obtain. See table below:

Disk Type RPM IOPS Range
SATA 5,400 50-75
SATA 7,200 75-100
SAS/FC 10,000 100-125
SAS/FC 15,000 175-200
SSD N/A 5,000-10,000

The watt of each HDDs is also quite easy. Just ask the vendor to give the specification of the HDDs.

The pricing part would be part where we can have a bit of fun with the storage vendor. Usually, storage vendors do not release the price of a single HDD in the quotation. The total price is lumped together with everything else, making it harder to decipher the price. So, what can the customer do?

Easy. Get 4-5 quotations from the storage vendor, each with different type of HDDs. This is the customer’s rights. For example, I have created several fictitious quotations, each with a different type of HDDs/SSD and pricing.

Quote #1 (SATA 7200 RPM)

 

Quote #2 (SAS 10,000 RPM)

 

Quote #3 (SAS 15,000 RPM)

 

Quote #4 (SSD)

 

From the 4 quotations, we cannot ascertain the true price of a single disk, but we can assume that the 12 units HDDs/SSDs take up 50% of the entire quotation. With all things being equal, especially the quantity of 12, we can establish the very rough estimate of the price. Having fun asking the storage vendor to run around with the quotations is the added bonus.

But we can derive the following figures (rough estimates but useful when we apply them to the key ratios above)

1TB SATA = 3333.33; 300GB 10,000 RPM SAS = 5000.00; 300GB 15,000 RPM SAS = 6250.00; 100GB SSD = 10416.66

When we juxtapose the information that we have collected i.e. price, performance and capacity (ok, I am skipping power/watt because I am lazy to find out), we come up with a table below:

In the boxed area, we can now easily determine which HDDs/SSDs that give the best value for money either Performance/$ or Capacity/$. The higher the key ratio, the better the value.

From this aspect, the customer can now determine methodically which type of disk he should invest into, in order to get the best value.

This is just a very simplistic method to find the value of the storage solution to be purchased. Bear in mind that there are many other factors to consider as well, such as rack unit height, total power consumption, storage efficiency, data protection and many more.

I am not taking credit for what Hufferd Smith has proposed. All kudos to him but I am using his method to apply to what is relevant to us on the field.

In conclusion, the customer won’t be baffled and confused thinking that they got the best deal at lowest price or fastest performance. This crude method can help turn perception into something that is more concrete and analytical. It’s time we, as customer, know our rights, and know what we are buying into and have a bit of fun too with the storage vendor.

Copy-on-Write and SSDs – A better match than other file systems?

We have been taught that file systems are like folders, sub-folders and eventually files. The criteria in designing file systems is to ensure that there are few key features

  • Ease of storing, retrieving and organizing files (sounds like a fridge, doesn’t it?)
  • Simple naming convention for files
  • Performance in storing and retrieving files – hence our write and read I/Os
  • Resilience in restoring full or part of a file when there are discrepancies

In file systems performance design, one of the most important factors is locality. By locality, I mean that data blocks of a particular file should be as nearby as possible. Hence, in most file systems designs originated from the Berkeley Fast File System (BFFS), requires the file system to seek the data block to be modified to ensure locality, i.e. you try not to split up the contiguity of the data blocks. The seek time to find the require data block takes time, but you are compensate with faster reads because the read-ahead feature allows you to read extra blocks ahead in anticipation that the data blocks are related.

In Copy-on-Write file systems (also known as shadow-paging file systems), the seek portion is usually not present because the new modified block is written somewhere else, not the present location of the original block. This is the foundation of Copy-on-Write file systems such as NetApp’s WAFL and Oracle Solaris ZFS. Because the new data blocks are written somewhere else, the storing (write operation) portion is faster. It eliminated the seek time and it also skipped the read-modify-write action to the original location of the data block. Therefore, write is likely to be faster.

However, the read portion will be slower because if you want to read a file, the file system has to go around looking for the data blocks because it lacks the locality. Therefore, as the COW file system ages, it tends to have higher file system fragmentation. I wrote about this in my previous blog. It is a case of ENJOY-FIRST/SUFFER-LATER. I am not writing this to say that COW file systems are bad. Obviously, NetApp and Oracle have done enough homework to make the file systems one of the better storage file systems in the market.

So, that’s Copy-on-Write file systems. But what about SSDs?

Solid State Drives (SSDs) will make enemies with file systems that tend prefer locality. Remember that some file systems prefer its data blocks to be contiguous? Well, SSDs employ “wear-leveling” and required writes to be spread out as much as possible across the SSDs device to prolong the life of the SSD device to reduce “wear-and-tear”. That’s not good news because SSDs just told the file systems, “I don’t like locality and I will spread out the data blocks“.

NAND Flash SSDs (the common ones we find in the market and not DRAM-based SSDs) are funny creatures. When you write to SSDs, you must ERASE first, WRITE AGAIN to the SSDs. This is the part that is creating the wear-and tear of the device. When I mean ERASE first, WRITE AGAIN, I describe it below

  • Writing 1 –> 0 (OK, no problem)
  • Writing 0 –> 1 (not OK, because NAND Flash can’t do that)

So, what does the SSD do? It ERASES everything, writing the entire data blocks on the device to 1s, and then converting some of them to 0s. Crazy, isn’t it? The firmware in the SSDs controller will also spread out the erase-and-then write operations across the entire SSD device to avoid concentrating the operations on a small location or dataset. This is the “wear-leveling” we often hear about.

Since SSDs shun locality and avoid the data blocks to be nearby, and Copy-on-Write file systems are already doing this because its nature to write new data blocks somewhere else, the combination of both COW file system and SSDs seems like a very good fit. It even looks symbiotic because it is a case of “I help you; and you help me“.

From this perspective, the benefits of COW file systems and SSDs extends beyond resiliency of the SSD device but also in performance. Since the data blocks are spread out at different locations in the SSD device, the effect of parallelism will inadvertently help with COW’s performance. Make sense, doesn’t it?

I have not learned about other file systems and how they behave with SSDs, but it is pretty clear that Copy-on-Write file systems works well with Solid State Devices. Have a good week ahead :-)!

Will SAN or NAS matter if your customer’s storage is in the Cloud?

An interesting question popped into my head yesterday. With all this push into the Cloud, the customer does not own most of the computer equipment. They are just getting services and when they want storage, do you think they care whether their storage is on a SAN or NAS?

I have mentioned this before, Cloud makes a lot of IT stuff irrelevant. Read my previous blog. This means that the demand for IT techies, sysadmins, consultant will suddenly be squeezed into who’s very good, good, not-so-good and the downright bad ones. Let’s the survival-of-the-fittest games begin!

Yes, the SAN and NAS, or even unified storage story doesn’t hold much weight anymore. However, to the cloud service provider, they will be out there looking for what is best for their bottom line, whether it will be a branded box or just a white box if they are willing to build the storage on their own. For those providers who have strong financials, obviously investing in premium brands like EMC, IBM, NetApp, and so on, makes sense because they need someone to blame and penalize when the shit hits the fan. For those who doesn’t have the financial prowess, this presents a whole new economy that resellers, partners, distributors can tap on to – build for these cloud providers at a cheaper price (hint, hint).

However, storage relies on a strong storage operating system to do just that. They are plenty of open source ones. Hey, you can practically build a simple iSCSI or NAS box with Linux. Consumer grade NAS such as NetGear, Synology and DLink have been using open-source Linux to penetrate the low-end, home storage market for years. The cloud providers will be a different ballgame, but the storage piece is fundamentally the same.

Things are changing folks, and for those consultants, product pre-sales, post-sales, sysadmins, operators of storage, you have to evolve to meet this new market. SAN and NAS do not matter anymore when customers are using the cloud services.

p/s: I have been spending time looking at some very, very cool cloud-ready storage operating systems. If you have the time, leave me a comment and we’ll talk. 😀

HP likely to buy Autonomy

Just saw the news. Here’s the link –http://www.themalaysianinsider.com/business/article/hp-may-drop-pcs-to-buy-autonomy-for-us11.7b/

HP could buy British software maker Autonomy. Just months ago, Autonomy just acquired Iron Mountain Digital’s asset with solutions such as Connected Backup, and the previous Mimosa Nearpoint.

Again, there is no value in the PC business anymore and it is only logical that HP is focusing on high value and high margin IT solutions. More news could follow.

Storage jobs are paid higher

The human capital is important in the IT industry. Yet, we are facing a situation where there is a steady supply of storage-related jobs, but the supply of human resources and skills to these jobs is seriously lacking. Even if there are many people applying for these positions, the good ones are far and few.

What’s happening? Malaysia has been stuck in a rut for quite a few years now trying to raise good quality human capital, especially in the IT sectors. God knows how hard agencies such as MDeC and others IT bodies have been trying to increase the awareness and supply of good quality IT people for the IT industry. In storage, the situation is even more acute because storage has been seen as the one of the unglamourous jobs. That is why there are likely to have more networking techies than storage techies.

We all know IT is all about data and information and the data and information are created, stored, modified, stored-again, replicated, migrated, archived and deleted IN STORAGE. Data has to reside in storage and memory before it can be used. And don’t forget that memory is temporary, volatile storage. It’s plain and simple – data has to be in some form of storage before it can be used.

I have been fairly disturbed by the fact that storage remains one of the most important foundations of IT and yet, the pool of good storage networking and data management professionals is seriously shallow, especially in Malaysia. The good ones are out there, kept as prized assets of the company they work for. But cloud computing is here, and the demand for storage professionals is greater than ever.

I went out and did some research, using the salary factor as the main criteria for storage jobs. And here’s what I found –>

 

The information source of the above chart is Certification Magazine Salary Survey 2009. I hope the table isn’t too small to read but here’s what I can summarize in the table below. The rows in red are the storage-related jobs.

 

Yes, the information is a bit old but it tells a tale. Storage professionals’ salary with the value that the storage certifications carry are in the upper echelons of the pay scale. At the same time during my research, I also found this page of information from Foote Partners LLC – 2011 IT Skills & Certification Pay Index.

And again, storage certification is usually higher in percentage than the median average pay premium.

However, all these information are from the US where skills and experience are valued highly because they drive innovation and sales.

Sadly, this cannot be always true in Malaysia because the IT economy in Malaysia has not reached the level of innovation that drives new technologies in the country’s economy. I think this is all our fault. Why? Here’s what I think

  • We go for the easy ones
  • We don’t want to learn anymore after we started working in IT
  • We don’t set high targets for ourselves
  • We accept things as they are – something to us being apathy towards things
  • We have other things on our mind – like politics, inflation and so on
  • We don’t innovate

And this is something that saddens me.

A few years ago, I was at a local FOSScon where the open source geeks and nerds and gurus convene. I was there for 2 reasons – to have a bit of fun, but more importantly, I was looking for people who had skills with kernel and file systems. Sadly, I found none after 2 days at the event. Almost all developers I spoke to where developing in PHP, mySQL, Python, Ruby-on-Rails and so on. This was a clear signal that most Malaysian developers were taking the easy way out (point #1 above). No one was programming in C, C++, and working on hardcore stuff like device drivers, networking protocols and so on.

Since we did not go out and outdo ourselves and innovate, we did not create an innovative IT economy that is the key for creating demand. Most IT companies in Malaysia would prefer playing the pass-through game i.e. “let’s pass through this deal with this reseller”, knowing full well that the reseller is there for relationship connections, not value-add. Hence point #1 again.

I recall another incident that also vivid in my mind. I was decommissioning some Sun JavaStations in the backroom of a premier, “multimedia” university in Melaka. I looked at the lecture that was going on and the instructor was teaching ApplixWord. I asked one of the students and he told me that the course was 2 (or was it 1) credit hours. It came as a big shocker to me because an premier IT multimedia university was teaching the most basic of the basics of word processing. A university is supposed to be the institution to ignite creativity and innovation, but this university was droning its students on word processing. No wonder our IT economy sucks because we set such f*cking low standards for ourselves (point #3).

What I would like to see if IT people go out of the box they didn’t know they were confined to in the first place, and learn/share and learn/share and learn/share. Be creative, be innovative, be bold.

I have been blessed with like-minded people who can do a good hack and build something that can compete with the big boys like EMC, IBM, HP and the likes. But these people are far and few.

Today, I am looking for more of such people, people who are f*cking (pardon me French but I am the passionate one) good with storage networking and data management stuff. I am looking for people who can innovate to create the real Silicon Valley culture in Malaysia. We don’t need fancy ministers to officiate or glamourous events launching but the real hackers, entrepreneurial junkies and those pioneering spirited wackos (in a good way), to define what this IT economy is all about!

What kind of IOPS and throughput do you get from RAID-5/6? – Part 2

In my previous blog entry, I mentioned the write penalty for RAID-5/6. This factor will figure heavily in the way we size the RAID-level for performance capacity planning.

It is difficult to ascertain what kind of IOPS and throughput that are required for an application, especially a database, to run well with additional room to grow. From a DBA or an application developer, I believe they would have adequate information to tell what is the numbers of users that the application can support, both average and peak, transactions per second (TPS), block size required for logs, database files and so on.

But as we are all aware, most of the time, these types of information are not readily available. So, coming from a storage angle, the storage administrator can advise the DBA or the application developer that the configured RAID group or volume or LUN is capable of delivering a certain number of IOPS and is able to achieve a certain throughput MB/sec. These numbers will be off the box itself immediately. Of course, other factors such as HBA speed, the FC/iSCSI configurations, the network traffic and so on will affect the overall performance delivery to the application. But we can safely inform the DBA and/or the application developer that this is what the storage is delivering out of the box.

The building blocks of all storage RAID groups/volumes/LUNs are pretty much your hard disk drives (HDDs) and/or Solid State Drives (SSDs). The manufacturer of these disks will usually publish the IOPS and throughput of individual drives but if these information is not available, we can construct IOPS of an individual HDD from its seek and latency times.

For example, if the HDD’s

average latency = 2.8 ms;          average read seek = 4.2 ms;              average write seek = 4.8 ms

then the IOPS can be calculated as

                                  1
         IOPS = ---------------------------------------
                (average latency) + (average seek time)

Therefore from the details above,

                    1
         IOPS = -------------------  = 136.986 IOPS
                (0.0028) + (0.0045)

That’s pretty simple, right? But of course, it is easier to just accept that a certain type of disk will have a range of IOPS as shown in the table below:

Disk Type RPM IOPS Range
SATA 5,400 50-75
SATA 7,200 75-100
SAS/FC 10,000 100-125
SAS/FC 15,000 175-200
SSD N/A 5,000-10,000

The information from the table above is just for reference only and by no means a very accurate one but it is good enough for us to determine the IOPS of a RAID group/volume/LUN. Let’s look at the RAID write penalty again in the table below:

RAID-level Number of I/O Reads
Number of I/O for Writes
RAID Write Penalty
0 1 1 1
1 (1+0, 0+1) 1 2 2
5 1 4 4
6 1 6 6

Next, we need to know what is the ratio of Reads vs Writes for that particular database or application. I mentioned earlier that in OLTP-type of applications, we usually take a 2:1 or 3:1 ratio in favour of Reads.

To make things simpler, let’s assume we create a RAID-6 volume of 6 data disks and 2 parity disks in a RAID-6 (6+2) configuration. The disks used are SATA disks of 7,200 RPM, with each individual disk of 100 IOPS. Assume we are using a ratio of 2:1 in favour of Reads, which gives us 66.666% and 33.333% respectively for Reads and Writes.

Therefore, the combined IOPS of the 8 disks in the RAID-6 configuration is probably about 800 IOPS. However, because of the write penalty of RAID-6, the effective IOPS for the RAID-6 volume will be lower than that. Let’s do some calculation to see what happens:

1)  Read IOPS + Write IOPS = 800 IOPS

2)  (0.66666 x 800) + (0.33333 x 800) = 800 IOPS

3) Read IOPS will be 0.66666 x 800 = 533.328 IOPS

4) Write IOPS will be 0.33333 x 800 = 266.664 IOPS. However, since RAID-6 has a write penalty of 6, this number has to be divided by 6. 266.664/6 will be 44.444 IOPS for Writes

Therefore, what the RAID-6 volume is capable of is approximately 533 IOPS for Reads and 44 IOPS for Writes.

We have determined IOPS for the RAID volume but what about throughput. Throughput is determined by the block size used. Assume that our RAID-6 volume uses a 4-K block size. With a combined effective IOPS of 577 (533+44), we multiply the IOPS with the block size

     Throughput = 577 IOPS x 4-KB
                = 2308KB/sec

Therefore when I/O is sustained in a sequential manner, the effective throughput is 2308KB/sec.

On the other hand, we often were told to add more spindles to the volume to increase the IOPS. This is true, to a point, where the maximum amount of IOPS that can be delivered will taper into a flatline, because the I/O channel to the RAID volume  has been saturated. Therefore, it is best to know that adding more spindles does not always equate to a higher IOPS.

Performance sizing for a database or an application is both a science and an art. Mathematically, we can prove things to a a certain amount of accuracy and confidence but each storage platform is very different in the way they handle RAID. Newer storage platforms have proprietary RAID that nowadays, it does not matter much what kind of RAID is best for the application. Vendors such as IBM XIV has RAID-X which both radical in design and implementation. NetApp will almost always say RAID-DP is the best no matter what, because RAID-DP is all NetApp.

So there is no right or wrong to choose the RAID-level for the application. But it is VERY important to know what are the best practice are and my advice is everyone is to do Proof-of-Concepts, and TEST, TEST, TEST! And ASK QUESTIONS!

Don’t RAID-5/6 everything! – Part 1

It’s a beautiful Saturday morning … the sun is out, and the birds are chirping … and here I am, thinking about RAID-5/6. What’s wrong with me?

Anyway, have you ever wondered almost all your volumes are in a RAID-5/6 configuration? Like an obedient child, the answer would probably be “Oh, my vendor said it is good for me …”

In storage, the rule is applications-read, applications-write. And different applications have different behaviors but typically, they fall under 2 categories:

  • Random access
  • Sequential access

The next question to ask is how much Read/Writes ratio (or percentage) is in that Random Access behavior and how much of Read/Write ratio in Sequential Access behavior.

We usually pigeonhole transactional databases such as SQL Server, Oracle into OLTP-type characteristics with random access being the dominant access method. Similarly, email applications such as Exchange, Lotus and even SMTP into similar OLTP-type characteristics as well. We typically do a 2:1 or 3:1 ratio for OLTP-type applications with Read heavy and less of Writes. Data warehouse type of databases tend to be more sequential.

However, even within these OLTP applications, there are also sequential access behaviors as well, as the following table for a database shows:

Operation Random or Sequential Read/Write Heavy Block Size
DB-Log Random (Sequential in log recovery) Write Heavy unless you are doing log recovery 1KB – 64KB
DB-Data Files Random Read/Write mix dependent on load 4KB – 32KB
Batch insert Sequential Write Heavy 8KB – 128KB
Index scan Sequential Read Heavy 8KB – 128KB

We will look into 4 RAID-levels in this scenario and see how each RAID-level applies to an OLTP-type of environment. These RAID levels are RAID-0, RAID-1 (1+0, 0+1 included), RAID-5 and RAID-6.

RAID-0 is the baseline, with 1 x Read and 1 x Write being processed as per normal.

In RAID-1, it would require 2 x Writes and 1 x Read, because the write operation is mirrored. The RAID penalty is 2.

To avoid the cost of RAID-1, RAID-5 is almost always the RAID level of choice (unless you speak to those NetApp fellas). RAID-5 is a parity-based RAID and require 2 x Read (1 to read the data block and 1 to read the parity block) AND 2 x Write (1 to write the modified block and 1 to write the modified parity). Hence it has a RAID penalty of 4.

RAID-6 was to address the risk of RAID-5 because disk capacity are so freaking large now (3TB just came out). To rebuild a large-TB drive would take longer time and the RAID-5 volume is at risk if a second disk failure occurs. Hence, double parity RAID in RAID-6. But unfortunately, the RAID penalty for RAID-6 is 6!

To summarize the RAID write penalty,

RAID-level Number of I/O Reads
Number of I/O for Writes
RAID Write Penalty
0 1 1 1
1 (1+0, 0+1) 1 2 2
5 1 4 4
6 1 6 6

So, it is well known that RAID 0 has good performance for reads and writes but with absolutely no protection. RAID-1 would be good for random reads and writes but it is costly. RAID-5 is good for applications with a high ratio of sequential reads vs writes (2:1, 3:1 as mentioned), and RAID-6, errr … should be taken similarly as RAID-5 with some additional performance penalty.

With that in mind, a storage administrator must question why a particular RAID-level was proposed to the database or any like-applications.

I am going out to enjoy the Saturday now … and today, August 13th is the World’s Left-Handed Day. More about this RAID penalty and IOPS in my next entry.