Performance benchmarks – the games that we play

First of all, congratulations to NetApp for beating EMC Isilon in the latest SPECSfs2008 benchmark for NFS IOPS. The news is everywhere and here’s one here.

EMC Isilon was blowing its horns several months ago when it  hit 1,112,705 IOPS recorded from a 140-node S200 cluster with 3,360 disk drives and a overall response time of 2.54 msecs. Last week, NetApp became top dog, pounding its chest with 1,512,784 IOPS on a 24 x FAS6240 nodes  with an overall response time of 1.53msecs. There were 1,728 450GB, 15,000rpm disk drives and the FAS6240s were fitted with Flash Cache.

And with each benchmark that you and I have seen before and after, we will see every storage vendors trying to best the other and if they did, their horns will be blaring, the fireworks are out and they will pounding their chests like Tarzan, saying “Who’s your daddy?” The euphoria usually doesn’t last long as performance records are broken all the time.

However, the performance benchmark results are not to be taken in verbatim because they are not true representations of real life, production environment. 2 years ago, the magazine, the defunct Byte and Switch (which now is part of Network Computing), did a 9-year study on File Systems and Storage Benchmarking. In a very interesting manner, it revealed that a lot of times, benchmarks results are merely reduced to single graphs which has little information about the details of how the benchmark was conducted, how long the benchmark took and so on.

The paper, published by Avishay Traeger and Erez Zadok from Stony Brook University and Nikolai Joukov and Charles P. Wright from the IBM T.J. Watson Research Center entitled, “A Nine Year Study of File System and Storage Benchmarking” studied 415 file systems from 106 published results and the article quoted:

Based on this examination the paper makes some very interesting observations and 
conclusions that are, in many ways, very critical of the way “research” papers have 
been written about storage and file systems.

 

Therefore, the paper highlighted the way the benchmark was done and the way the benchmark results were reported and judging by the strong title (It was titled “Lies, Damn Lies and File Systems Benchmarks”) of the online article that reviewed the study, benchmarks are not the pictures that says a thousand words.

Be it TPC-C, SPC1 or SPECSfs benchmarks, I have gone through some interesting experiences myself, and there are certain tricks of the trade, just like in a magic show. Some of the very common ones I come across are

  • Short stroking – a method to format a drive so that only the outer sectors of the disk platter are used to store data. This practice is done in I/O-intensive environments to increase performance.
  • Shortened test – performance tests that run for several minutes to achieve the numbers rather than prolonged periods (which mimics real life)
  • Reporting aggregated numbers – Note the number of nodes or controllers used to achieve the numbers. It is not ONE controller than can achieve the numbers, but an aggregated performance results factored by the number of controllers

Hence, to get to the published benchmark numbers in real life is usually not practical and very expensive. But unfortunately, customers are less educated about the way benchmarks are performed and published. We, as storage professionals, have to disseminate this information.

Ok, this sounds oxymoronic because if I am working for NetApp, why would I tell the truth that could actually hurt NetApp sales? But I don’t work for NetApp now and I think it is important for me do my duty to share more information. Either way, many people switch jobs every now and then, and so if you want to keep your reputation, be honest up front. It could save you a lot of work.