Let’s smoke the storage peace pipe

NVMe (Non-Volatile Memory Express) is upon us. And in the next 2-3 years, we will see a slew of new storage solutions and technology based on NVMe.

Just a few days ago, The Register released an article “Seventeen hopefuls fight for the NVMe Fabric array crown“, and it was timely. I, for one, cannot be more excited about the development and advancement of NVMe and the upcoming NVMeF (NVMe over Fabrics).

This is it. This is the one that will end the wars of DAS, NAS and SAN and unite the warring factions between server-based SAN (the sexy name differentiating old DAS and new DAS) and the networked storage of SAN and NAS. There will be PEACE.

Remember this?

nutanix-nosan-buntingNutanix popularized the “No SAN” movement which later led to VMware VSAN and other server-based SAN solutions, hyperconverged techs such as PernixData (acquired by Nutanix), DataCore, EMC ScaleIO and also operated in hyperscalers – the likes of Facebook and Google. The hyperconverged solutions and the server-based SAN lines blurred of storage but still, they are not the usual networked storage architectures of SAN and NAS. I blogged about this, mentioning about how the pendulum has swung back to favour DAS, or to put it more appropriately, server-based SAN. There was always a “Great Divide” between the 2 modes of storage architectures. Continue reading

No Flash in the pan

The storage networking market now is teeming with flash solutions. Consumers are probably sick to their stomach getting a better insight which flash solution they should be considering. There are so much hype, fuzz and buzz and like a swarm of bees, in the chaos of the moment, there is actually a calm and discerning pattern slowly, but surely, emerging. Storage networking guys would probably know this thing well, but for the benefit of the other readers, how we view flash (and other solid state storage) becomes clear with the picture below: Flash performance gap

(picture courtesy of  http://electronicdesign.com/memory/evolution-solid-state-storage-enterprise-servers)

Right at the top, we have the CPU/Memory complex (labelled as Processor). Our applications, albeit bytes and pieces of them, run in this CPU/Memory complex.

Therefore, we can see Pattern #1 showing up. Continue reading

Correcting NCQ incorrect portrayal with SSDs

A kind reader, Baruch Even, has pointed out my ignorance with SATA Native Command Queuing (NCQ) working with Solid State Drives (SSDs) in my previous blog.

In the post, I have haphazardly stated that NCQ was meant for spinning mechanical drives. I was wrong.

NCQ does indeed improve the performance of SSDs using SATA interfaces, and sometimes as much as 15-20%. I know there is a statement in the SATA Wikipedia page that says that NCQ boosted IOPS by 100% but I would take a much more realistic view of things rather than setting the expectations too high.

The typical SSD consists of flash storage spread across multiple chips, which in turn are a bunch of flash packages. Within each of the flash packages, there are different dies (as in manufacturing terminology “die”, not related to the word of “death”) that houses planes (not related to aeroplanes) and subsequently into blocks and pages.

Continue reading

Boosting Solid States beyond SATA

Lately, I have been getting deeper and deeper into low-level implementation related to storage technologies. In my previous blog, I was writing my learning adventure with Priority Flow Control (PFC) and intend to further the Data Center Bridging concepts with future blog entries.

Before I left for Sydney for a holiday last week, I got sidetracked into exciting stuff that’s happening in my daily encounters with friends and new friends. 2 significant storage related technologies fell onto my lap. One is NVMe (Non-Volatile Memory express) and the other FPGA (Field Programmable Gate Array).

While this blog is going to be about NVMe, I actually found FPGA much more exciting to me. Through conversations, I found that there are 2 “biggies” in the FPGA world, and they are designed and manufactured by Xilink and Altera. I admit that I have not done my homework on FPGA yet, having just returned from Sydney last night. I will blog about FPGA in future blogs.

But NVMe is also an important technology direction to the storage world as well.

I think most of us are probably already mesmerized by solid state drives. The bombardment of marketing, presentations, advertising and whatever else the vendors do to promote (and self-promote) solid state drives are inundating the intellectual senses of consumers and enterprises alike. And yet, many vendors do not explain both the pros and cons of integrating solid states into their IT environment. Even worse, many don’t even know the strengths and weaknesses of solid states, hence creating some exaggeration that continues to create a spiral vortex of inaccuracies. Like a self-feeding frenzy, the industry seems to have placed solid state storage as the saviour of the enterprise storage world. Go figure with that!

Continue reading

What kind of IOPS and throughput do you get from RAID-5/6? – Part 2

In my previous blog entry, I mentioned the write penalty for RAID-5/6. This factor will figure heavily in the way we size the RAID-level for performance capacity planning.

It is difficult to ascertain what kind of IOPS and throughput that are required for an application, especially a database, to run well with additional room to grow. From a DBA or an application developer, I believe they would have adequate information to tell what is the numbers of users that the application can support, both average and peak, transactions per second (TPS), block size required for logs, database files and so on.

But as we are all aware, most of the time, these types of information are not readily available. So, coming from a storage angle, the storage administrator can advise the DBA or the application developer that the configured RAID group or volume or LUN is capable of delivering a certain number of IOPS and is able to achieve a certain throughput MB/sec. These numbers will be off the box itself immediately. Of course, other factors such as HBA speed, the FC/iSCSI configurations, the network traffic and so on will affect the overall performance delivery to the application. But we can safely inform the DBA and/or the application developer that this is what the storage is delivering out of the box.

The building blocks of all storage RAID groups/volumes/LUNs are pretty much your hard disk drives (HDDs) and/or Solid State Drives (SSDs). The manufacturer of these disks will usually publish the IOPS and throughput of individual drives but if these information is not available, we can construct IOPS of an individual HDD from its seek and latency times.

For example, if the HDD’s

average latency = 2.8 ms;          average read seek = 4.2 ms;              average write seek = 4.8 ms

then the IOPS can be calculated as

                                  1
         IOPS = ---------------------------------------
                (average latency) + (average seek time)

Therefore from the details above,

                    1
         IOPS = -------------------  = 136.986 IOPS
                (0.0028) + (0.0045)

That’s pretty simple, right? But of course, it is easier to just accept that a certain type of disk will have a range of IOPS as shown in the table below:

Disk Type RPM IOPS Range
SATA 5,400 50-75
SATA 7,200 75-100
SAS/FC 10,000 100-125
SAS/FC 15,000 175-200
SSD N/A 5,000-10,000

The information from the table above is just for reference only and by no means a very accurate one but it is good enough for us to determine the IOPS of a RAID group/volume/LUN. Let’s look at the RAID write penalty again in the table below:

RAID-level Number of I/O Reads
Number of I/O for Writes
RAID Write Penalty
0 1 1 1
1 (1+0, 0+1) 1 2 2
5 1 4 4
6 1 6 6

Next, we need to know what is the ratio of Reads vs Writes for that particular database or application. I mentioned earlier that in OLTP-type of applications, we usually take a 2:1 or 3:1 ratio in favour of Reads.

To make things simpler, let’s assume we create a RAID-6 volume of 6 data disks and 2 parity disks in a RAID-6 (6+2) configuration. The disks used are SATA disks of 7,200 RPM, with each individual disk of 100 IOPS. Assume we are using a ratio of 2:1 in favour of Reads, which gives us 66.666% and 33.333% respectively for Reads and Writes.

Therefore, the combined IOPS of the 8 disks in the RAID-6 configuration is probably about 800 IOPS. However, because of the write penalty of RAID-6, the effective IOPS for the RAID-6 volume will be lower than that. Let’s do some calculation to see what happens:

1)  Read IOPS + Write IOPS = 800 IOPS

2)  (0.66666 x 800) + (0.33333 x 800) = 800 IOPS

3) Read IOPS will be 0.66666 x 800 = 533.328 IOPS

4) Write IOPS will be 0.33333 x 800 = 266.664 IOPS. However, since RAID-6 has a write penalty of 6, this number has to be divided by 6. 266.664/6 will be 44.444 IOPS for Writes

Therefore, what the RAID-6 volume is capable of is approximately 533 IOPS for Reads and 44 IOPS for Writes.

We have determined IOPS for the RAID volume but what about throughput. Throughput is determined by the block size used. Assume that our RAID-6 volume uses a 4-K block size. With a combined effective IOPS of 577 (533+44), we multiply the IOPS with the block size

     Throughput = 577 IOPS x 4-KB
                = 2308KB/sec

Therefore when I/O is sustained in a sequential manner, the effective throughput is 2308KB/sec.

On the other hand, we often were told to add more spindles to the volume to increase the IOPS. This is true, to a point, where the maximum amount of IOPS that can be delivered will taper into a flatline, because the I/O channel to the RAID volume  has been saturated. Therefore, it is best to know that adding more spindles does not always equate to a higher IOPS.

Performance sizing for a database or an application is both a science and an art. Mathematically, we can prove things to a a certain amount of accuracy and confidence but each storage platform is very different in the way they handle RAID. Newer storage platforms have proprietary RAID that nowadays, it does not matter much what kind of RAID is best for the application. Vendors such as IBM XIV has RAID-X which both radical in design and implementation. NetApp will almost always say RAID-DP is the best no matter what, because RAID-DP is all NetApp.

So there is no right or wrong to choose the RAID-level for the application. But it is VERY important to know what are the best practice are and my advice is everyone is to do Proof-of-Concepts, and TEST, TEST, TEST! And ASK QUESTIONS!