Using simple MTBF to determine reliability to Finance

The other day, a prospect was requesting quotations after quotations from a friend of mine to make so-called “apple-to-apple” comparison with another storage vendor. But it was difficult to have that sort of comparisons because one guy would propose SAS, and the other SATA and so on. I was roped in by my friend to help. So in the end I asked this prospect, which 3 of these criteria matters to him most – Performance, Capacity or Reliability.

He gave me an answer and the reliability criteria was leading his requirement. Then he asked me if I could help determine in a “quick-and-dirty manner” by using MTBF (Mean Time Between Failure) of the disks to convince his finance about the question of reliability.

Well, most HDD vendors published their MTBF as a measuring stick to determine the reliability of their manufactured disks. MTBF is by no means accurate but it is useful to define HDD reliability in a crude manner. If you have seen the components that goes into a HDD, you would be amazed that the HDD components go through a tremendously stressed environment. The Read/Write head operating at a flight height (head gap)  between the platters thinner than a human hair and the servo-controlled technology maintains the constant, never-lagging 7200/10,000/15,000 RPM days-after-days, months-after-months, years-after-years. And it yet, we seem to take the HDD for granted, rarely thinking how much technology goes into it on a nanoscale. That’s technology at its best – bringing something so complex to make it so simple for all of us.

I found that the Seagate Constellation.2 Enterprise-class 3TB 7200 RPM disk MTBF is 1.2 million hours while the Seagate Cheetah 600GB 10,000 RPM disk MTBF is 1.5 million hours. So, the Cheetah is about 30% more reliable than the Constellation.2, right?

Wrong! There are other factors involved. In order to achieve 3TB usable, a RAID 1 (average write performance, very good read performance) would require 2 units of 3TB 7200 RPM disks. On the other hand, using a 10, 000 RPM disks, with the largest shipping capacity of 600GB, you would need 10 units of such HDDs. RAID-DP (this is NetApp by the way) would give average write performance (better than RAID 1 in some cases) and very good read performance (for sequential access).

So, I broke down the above 2 examples to this prospect (to achieve 3TB usable)

  1. Seagate Constellation.2 3TB 7200 RPM HDD MTBF is 1.2 million hours x 2 units
  2. Seagate Cheetah 600GB 10,000 RPM HDD MTBF is 1.5 million hours x 10 units

By using a simple calculation of

    RF (Reliability Factor) = MTBF/#HDDs

the prospect will be able to determine which of the 2 HDD types above could be more reliable.

In case #1, RF is 600,000 hours and in case #2, the RF is 125,000 hours. Suddenly you can see that the Constellation.2 HDDs which has a lower MTBF has a higher RF compared to the Cheetah HDDs. Quick and simple, isn’t it?

Note that I did not use the SAS versus SATA technology into the mixture because they don’t matter. SAS and SATA are merely data channels that drives data in and out of the spinning HDDs. So, folks, don’t be fooled that a SAS drive is more reliable than a SATA drive. Sometimes, they are just the same old spinning HDDs. In fact, the mentioned Seagate Constellation.2 HDD (3TB, 7200 RPM) has both SAS and SATA interface.

Of course, this is just one factor in the whole Reliability universe. Other factors such as RAID-level, checksum, CRC, single or dual-controller also determines the reliability of the entire storage array.

In conclusion, we all know that the MTBF alone does not determine the reliability of the solution the prospect is about to purchase. But this is one way you can use to help the finance people to get the idea of reliability.

About cfheoh

I am a technology blogger with 20+ years of IT experience. I write heavily on technologies related to storage networking and data management because that is my area of interest and expertise. I introduce technologies with the objectives to get readers to *know the facts*, and use that knowledge to cut through the marketing hypes, FUD (fear, uncertainty and doubt) and other fancy stuff. Only then, there will be progress. I am involved in SNIA (Storage Networking Industry Association) and as of October 2013, I have been appointed as SNIA South Asia & SNIA Malaysia non-voting representation to SNIA Technical Council. I was previously the Chairman of SNIA Malaysia until Dec 2012. As of August 2015, I am returning to NetApp to be the Country Manager of Malaysia & Brunei. Given my present position, I am not obligated to write about my employer and its technology, but I am indeed subjected to Social Media Guidelines of the company. Therefore, I would like to make a disclaimer that what I write is my personal opinion, and mine alone. Therefore, I am responsible for what I say and write and this statement indemnify my employer from any damages.
Bookmark the permalink.

4 Responses to Using simple MTBF to determine reliability to Finance

  1. Alex Tien says:

    Your calculation is right if it’s only reliability. However, 10x600GB SAS disk would significantly outperform 2x3TB SATA.

    Also, I’ve heard the practice is that disk platters go through an evaluation during manufacturing, and higher quality platters are put into SAS while lower qualities ones are put to SATA. This is analogous to Intel sorting out different GHz speeds on their chips. If someone from the HDD industry can confirm that would be great.

    Just giving an alternate picture here :).

    • cfheoh says:

      Tien,

      There are many factors involved but the prospect needs to find a way to appease the Finance side in a logical manner. Given the gamut of variables on reliability, it is difficult to simplify all these variables eg. trying to explain RAID 5/RAID 1 reliable to non-technical people.

      However, there are also are other factors. You mentioned performance. Let me pose you a question.

      Would a 2 x 7200 RPM disks RAID 1+0 configuration lose to a 10 x 10,000 RPM disks RAID-DP in terms of IOPS for random reads and writes? Likely, but by how much in percentage? Can we come up with some kind of $ figures for true calculation?

      If you are interested, I am willing to sit down with you to work on some kind of methodology to approach this matter logically. How about that?

      Thanks
      /Chin Fah

    • cfheoh says:

      Hi Tien

      As I mentioned in the blog, Seagate Constellation.2 3TB 7200 RPM has both SAS and SATA. So there is no difference here whether it is SAS or SATA. I believe you mean that the higher quality materials goes to the 10K and 15K RPM disks because you have to read the platter at a much higher RPM compared to the 7200RPM. However, if you observe, the higher RPM disks tends to have lower capacities than the lower RPM disks but eventually they will increase the capacities.

      Whether you are using SAS or SATA does not really matter, because it is just a matter of 3Gbps or 6Gbps (for the present specifications of SATA and SAS respectively). However, SAS are dual-ported, and in its firmware, it has more resilient features such as queue tagging and so on.

      Just sharing my 2 bits.

      Thanks
      /Chin Fah

  2. Pingback: Don’t just look at disk reliability! « Storage Gaga

Leave a Reply

Your email address will not be published. Required fields are marked *