The recipe for storage performance modeling

Good morning, afternoon, evening, Ladies & Gentlemen, wherever you are.

Today, we are going to learn how to bake, errr … I mean, make a storage performance model. Before we begin, allow me to set the stage.

Don’t you just hate it when you are asked to do storage performance sizing and you don’t have a freaking idea how to get started? A typical techie would probably say, “Aiya, just use the capacity lah!”, and usually, they will proceed to size the storage according to capacity. In fact, sizing by capacity is the worst way to do storage performance modeling.

Bear in mind that storage is not a black box, although some people wished it was. It is not black magic when it comes to performance sizing because things can be applied in a very scientific and logical manner.

SNIA (Storage Networking Industry Association) has made a storage performance modeling methodology (that’s quite a mouthful), and basically simplified it into these few key ingredients. This recipe is for storage performance modeling in general and I am advising you guys out there to engage your storage vendors professional services. They will know their storage solutions best.

And I am going to say to you – Don’t be cheap and not engage professional services – to get to the experts out there. I was having a chat with an consultant just now at McDonald’s. I have known this friend of mine for about 6-7 years now and his name is Sugen Sumoo, the Director of DBORA Consulting. They specialize in Oracle and database performance tuning and performance forecasting and it is something that a typical DBA can’t do, because DBORA Consulting is the Professional Service that brings expertise and value to Oracle customers. Likewise, you have to engage your respective storage professional services as well.

In a cook book or a cooking show, you are presented with the ingredients used and in this recipe for storage performance modeling, the ingredients (in no particular order) are:

  • Application block size
  • Read and Write ratio
  • Application access patterns
  • Working set size
  • IOPS or throughput
  • Demand intensity

Application Block Size

First of all, the storage is there to serve applications. We always have to look from the applications’ point of view, not storage’s point of view.  Different applications have different block size. Databases typically range from 8K-64K and backup applications usually deal with larger block sizes. Video applications can have 256K block sizes or higher. It all depends.

The best way is to find out from the DBA, email administrator or application developers. The unfortunate thing is most so-called technical people or administrators in Malaysia doesn’t have a clue about the applications they manage. So, my advice to you storage professionals, do your research on the application and take the default value. These clueless fellas are likely to take the default.

Read and Write ratio

Applications behave differently at different times of the day, and at different times of the month (no, it’s not PMS). At the end of the financial year or calendar, there are some tasks that these applications do as well. But in a typical day, there are different weightage or percentage of read operations versus write operations.

Most OLTP (online transaction processing)-based applications tend to be read heavy and write light, but we need to find out the ratio. Typically, it can be a 2:1 ratio or 60%:40%, but it is best to speak to the application administrators about the ratio. DSS (Decision Support Systems) and data warehousing applications could have much higher reads than writes while a seismic-analysis applications can have multiple writes during the analysis periods. It all depends.

To counter the “clueless” administrators, ask lots of questions. Find out the workflow of several key tasks and ask what that particular tasks do at different checkpoints of the application’s processing. If you are lazy (please don’t be lazy, because it degrades your value as a storage professional), use a rule of thumb.

Application access patterns

Applications behave differently in general. They can be sequential, like backup or video streaming. They can be random like emails, databases at certain times of the day, and so on. All these behavioral patterns affect how we design and size the disks in the storage.

Some RAID levels tend to work well with sequential access and others, with random access. It is not difficult to find out about the applications’ pattern and if you read more about the different RAID-levels in storage, you can easily identify the type of RAID levels suitable for each type of behavioral patterns.

Working set size

This variable is a bit more difficult to determine. This means that a chunk of the application has to be loaded into a working area, usually memory and cache memory, to be used and abused by the application users.

Unless someone is well versed with the applications, one would not be able to determine how much of the applications would be placed in memory and in cache memory. Typically, this can only be determined after the application has been running for some time.

The flexibility of having SSDs, especially the DRAM-type of SSDs, are very useful to ensure that there is sufficient “working space” for these applications.

IOPS or Throughput

According to SNIA model, for I/O less than 64K, IOPS should be used as a yardstick to do storage performance modeling. Anything larger, use throughput, in which MB/sec is the measurement unit.

The application guy would be able to tell you what kind of IOPS their application is expecting or what kind of throughput they want. Again, ask a lot of questions, because this will help you determine the type of disks and the kind of performance you give to the application guys.

If the application guy is clueless again, ask someone more senior or ask the vendor. If the vendor engineers cannot give you an answer, then they should not be working for the vendor.

Demand intensity

This part is usually overlooked when it comes to performance sizing. Demand intensity refers to how intense is the I/O requests. It could come from 1 channel or 1 part of the applications, or it could come from several parts of the applications in parallel. It is as if the storage is being ‘bombarded’ by applications and this is the part that is hard to determine as well.

In some applications, the degree of intensity or parallelism can be tuned and to find out, ask the application administrator or developer. If not, ask the vendor. Also do a lot of research on the application’s architecture.

And one last thing. What I have learned is to add buffers to the storage performance model. Typically I would add about 10-20% extra but you never know. As storage professionals, I would strongly encourage to engage professional services, because it is worthwhile, especially in the early stages of the sizing. It is usually a more expensive affair to size it after the applications have been installed and running.

“Failure to plan is planning to fail”.  The recipe isn’t that difficult. Go figure it out.

Go on and be a storage extraordinaire

ex·tra·or·di·naire – Outstanding or remarkable in a particular capacity

I was plucking the Internet after dinner while I am holidaying right now in Port Dickson. And at about this time, the news from my subscriptions will arrive, perfectly timed as my food is digesting.

And in the news – “IDC Says Cloud Adoption Fuels Storage Sales”. You think?

We are generating so much data in this present moment, that IDC is already saying that we are doubling our data every 2 years. That’s massive and a big part of it is being fueled our adoption to Cloud. It doesn’t matter if it is a public, private or hybrid cloud because the way we use IT has changed forever. It’s all too clear.

Amazon has a massive repository of contents; Google has been gobbling tons of data and statistics since its inception; Apple has made IT more human; and Facebook has changed the way we communicate. FastCompany magazine called Amazon, Apple, Google and Facebook the Big 4 and they will converge sooner or later into what the tornado chasers call a Perfect Storm. Every single effort that these 4 companies are doing now will inevitably meet at one point, where content, communication, computing, data, statistics all become the elements of the Perfect Storm. And the outcome of this has never been more clearer. As FastCompany quoted:

“All of our activity on these devices produces a wealth of data, which leads to the third big idea underpinning their vision. Data is like mother’s milk for Amazon, Apple, Facebook, and Google. Data not only fuels new and better advertising systems (which Google and Facebook depend on) but better insights into what you’d like to buy next (which Amazon and Apple want to know). Data also powers new inventions: Google’s voice-recognition system, its traffic maps, and its spell-checker are all based on large-scale, anonymous customer tracking. These three ideas feed one another in a continuous (and often virtuous) loop. Post-PC devices are intimately connected to individual users. Think of this: You have a family desktop computer, but you probably don’t have a family Kindle. E-books are tied to a single Amazon account and can be read by one person at a time. The same for phones and apps. For the Fab Four, this is a beautiful thing because it means that everything done on your phone, tablet, or e-reader can be associated with you. Your likes, dislikes, and preferences feed new products and creative ways to market them to you. Collectively, the Fab Four have all registered credit-card info on a vast cross-section of Americans. They collect payments (Apple through iTunes, Google with Checkout, Amazon with Amazon Payments, Facebook with in-house credits). Both Google and Amazon recently launched Groupon-like daily-deals services, and Facebook is pursuing deals through its check-in service (after publicly retreating from its own offers product).”

Cloud is changing the way we work, we play, we live and data is now the currency of humans in the developed and developing worlds. And that is good news for us storage professionals, because all the data has to eventually end up in a storage somewhere, somehow.

That is why there is a strong demand for storage networking professionals. Not just any storage professionals but the ones that have the right attitude to keep developing themselves, enhancing their skillset, knowledge and experience. The ones that can forsee that the future will worship them and label them as deities of the Cloud era.

So why are you guys take advantage of this? Well, don’t just sit there and be ordinary. Be a storage extraordinaire now! And for those guys who want to settle of being ordinary … too bad! I said this before – You could lose your job.

Happy school holidays!