3TB Seagate – a performance sloth

I can’t get home. I am stuck here at the coffee shop waiting out the traffic jam after the heavily downpour an hour ago.

It has been an interesting week for me, which began last week when we were testing the new Seagate 3TB Constellation ES.2 hard disk drives. It doesn’t matter if it was SAS or SATA because the disks were 7,200 RPM, and basically built the same. SAS or SATA is merely the conduit to the disks and we were out there maneuvering the issue at hand.

Here’s an account of  testing done by my team. My team has been testing the drives meticulously, using every trick in the book to milk performance from the Seagate drives. In the end, it wasn’t the performance we got but more like duds from Seagate where these type of drives are concerned.

How did the tests go?

We were using a Unix operating system to test the sequential writes on different partitions of the disks, each with a sizable GB per partition. In one test, we used 100GB per partition. With each partition, we were testing the outer cylinders to the inner cylinders, and as the storage gurus will tell you, the outer rings runs at a faster speed than the inner rings.

We thought it could be the file system we were using, so we switched the sequential writes to raw disks. We tweaked the OS parameters and tried various combinations of block sizes and so on. And what discovered was a big surprise.

The throughput we got from the sequential writes were horrible, started out with MB/sec lower almost 25% lower than a 2TB Western Digital RE4 disk, and as it went on, the throughput in the inner rings went down to single digit MB/sec. According to reliable sources, the slowest published figures by Seagate were in the high 60’s for MB/sec but what we got were close to 20+MB/sec. The Western Digital RE4 was giving out consistent throughput numbers throughout the test. We couldn’t believe it!

We scoured the forums looking for similar issues, but we did not find much about this.This could be a firmware bug. We are in the midst of opening an escalation channel to Seagate to seek explanation. I would like to share what we have discovered and the issue can be easily reproduced. For customers who have purchased storage arrays with 2 or 3TB Seagate Constellation ES/2 drives, please take note. We were disappointed with the disks but thanks to my team for their diligent approach that resulted in this discovery.

Brocade is ripe again

Like seasonable fruits, Brocade is ready to be plucked from the Fibre Channel tree (again). A few years ago, it put itself up for sale. There were suitors but no one offered to take up Brocade. Over the last few days, the rumour mill is at it again, and while Brocade did not comment, the news is happening again.

Why is Brocade up for sale? One can only guess. Over the past year, their stock has been pounded in the past months and as of last Friday, stood at USD4.51. The news mentioned that Brocade market capitalization is around USD2.7-2.8 billion, low enough to be acquired.

Brocade has been a fantastic Fibre Channel company in the past, and still pretty much is. They have survived the first Fibre Channel shake-up, and companies like Vixel, Gadzoox, and Ancor are no longer in the Fibre Channel’s industry map. They have thrived throughout, until Cisco MDS started to make dents into Brocade’s armour.

Today, a big portion of their business still relies on Fibre Channel to drive revenues and profits. A few years ago in 2008, they acquired Foundry Networks, an Gigabit Ethernet company and it was the right move as the world was converging towards 10 Gigabit. However, it is only in the past 2-3 years, that Brocade has come out with a more direct approach rather than spending most of their time on their OEM business in this region. Perhaps this laggard approach and their inaction in the past have cost them their prime position and now they are primed to be swooped up by probable suitors.

Who, will be the probable suitors now? IBM, Oracle, Juniper and even possibly Cisco could be strong candidates. IBM makes a lot of sense because I believe IBM wants to own technology and Brocade has a lot of technology and patents to offer. Oracle, hmm … they are not a hardware company. It is true that they bought Sun, but from my internal sources, Oracle is not cool with hardware innovations. They just want to sell more Oracle software licenses, keeping R&D and innovation on a short leash, and keeping R&D costs on Sun’s hardware low.

Juniper makes sense too, because they have a sizeable Ethernet business. I was a tad bit disappointed when I got to know that Juniper started selling entry-level Gigabit switches, because I have always placed them at lofty heights with their routers. But I guess, as far as business goes, Juniper did the only natural thing – If there money to be made, why not? If Juniper takes up Brocade, they can have 2 formidable storage networking businesses, Fibre Channel and Data Center Ethernet (DCE). The question now is – Does Juniper want the storage business?

If Cisco buys Brocade, that would mean alarm bells everywhere. It would trigger the US side to look into anti-competitive implications of the purchase. Unfortunately, Cisco has become a stagnant giant, and John Chambers, their CEO is dying to revive the networking juggernaut. There were also rumours of Cisco breaking up to unlock the value of the many, many companies and technologies they acquired in the past. I believe, buying Brocade does not help Cisco, because as they have done in the past with other acquisitions, there are too many technology similarities to extract Brocade’s value.

We will not know how Brocade will fare in 2012, suitors or not, because they are indeed profitable. Unfortunately, the stock options scandal last year plus the poor track record of their acquisitions such as NuView, Silverback, and even Foundry Networks, are not helping to put Brocade in a different light.

If the rumours are true, putting itself up for sale only cheapens the Brocade image. Quid proxima, Brocade?

Signs of things to come?

I wanted to sign off early tonight but an article in ComputerWorld caught my tired eyes. It was titled “EMC to put hardware into servers, VMs into storage” and after I read it, I couldn’t help but to juxtapose the articles with what I said earlier in my blogs, here and here.

It is very interesting to note that “EMC runs vSphere directly on the storage controllers and then uses vMotion to migrate VMs from application servers onto the storage array, ..” since the storage boxes have enough compute power to run Virtual Machines on the storage. Traditionally and widely accepted, VMs should be running on servers. Contrary to beliefs, EMC has already demonstrated this running of VMs capability on their VNX, Isilon and Symmetrix.

And soon, with EMC’s Project Lightning (announced at EMC World in May 2011), they will be introducing server side PCIe-based SSDs, ala Fusion-IO. This is different from the NetApp PAM/FlashCache PCIe-based card, which sits on their arrays, not on hosts or servers. And it is also very interesting to note that this EMC server-side PCIe Flash SSD card will become a bridge to EMC’s FAST (Fully Automated Storage Tiering) architecture, enabling it to place hot, warm and cold data strategically on different storage tiers of the applications on VMware’s VMs (now on either the server or the storage),  perhaps using vMotion as a data mover on top of the “specialized” link created by the server-side EMC PCIe card.

This also blurs the line between the servers and storage and creates a virtual architecture between servers and storage, because what used to be distinct data border of the servers is now being melded into the EMC storage array, virtually.

2 red alerts are flagging in my brain right now.

  1. The “bridge” has just linked the server back to the storage, after years of talking about networked storage. The server is ONE again with the storage. Doesn’t that look to you like a server with plenty of storage? It has come a full cycle. But more interesting and what I am eager to see is what more is this “bridge” capable of when it comes to data management. vMotion might be the first of many new “protocol” breeds to enhance data management and mobility with this “bridge”. I am salivating right now of this massive potential.
  2. What else can EMC do with the VMware API? This capability I am writing right now is made possible by EMC tweaking VMware’s API to maximize much, much more. As the VMware vStorage API is continually being enhanced, the potential is again, very massive and could change the entire landscape of cloud computing and subsequently, the entire IT landscape. This is another Pavlov’s dog moment (see figures below as part of my satirical joke on myself)

 

Sorry, the diagram below is not related to what my blog entry is. Just my way of describing myself right now. 😉

I am extremely impressed with what EMC is doing. A lot of smarts and thinking go into this and this is definitely signs of things to come. The server and the storage are “merging again”. Think of it as Borg assimilation in Star Trek.

Resistance is futile!

Big data is big headache

IBM claims that we are responsible of for creating 2.5 quintillion bytes of data every day. How much is 1 quintilion?

 

According to the web,

1 quintillion = 1,000,000,000,000,000,000

After billion, it is trillion, then quadrillion, and then quintillion. That’s what 1 quintillion is, with 18 zeroes!

These data comes from everything from social networking updates, meteorology (weather reports), remote sensing maps (Google Maps, GPS, Geographical Information Systems), photos (Flickr), videos (YouTube), Internet search (Google) and so on. The big data terminology, according to Wikipedia, is data that are too large to be handled and processed by conventional data management tools. This presents a new set of difficulties when it comes to collected these data, storing them and sharing them. Indexing and searching big data would require special technologies to be able to mine and extract valuable information from big data datasets, within an acceptable period of time.

According to Wiki, “Technologies being applied to big data include massively parallel processing (MPP) databases, datamining grids, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems.” That is why EMC has paid big money to acquire GreenPlum and IBM acquired Netezza. Traditional data warehousing players such Teradata, Oracle and Ingres are in the picture as well, setting a collision course between the storage and infrastructure companies and the data warehousing solutions companies.

The 2010 Gartner Magic Quadrant has seen non-traditional players such as IBM/Netezza and EMC/Greenplum, in its leaders quadrant.

 

And the key word that is already on everyone’s lips is “ANALYTICS“.

The ability to extract valuable information that helps determines what the next future trend is and personalized profiling will be something that may already arrived as companies are clamouring to get more and more out of our personalities so that they can sell you more of their wares.

Meteorological organizations are using big data analytics to find out about weather patterns and climate change. Space exploration becomes more acute and precise from the tons and tons of data collected from space explorations. Big data analytics are also helping pharmaceutical companies develop new biological and pharmaceutical breakthroughs. And the list goes on.

I am a new stranger into big data and I do not proclaim to know a lot. But terms such as scale-out NAS, distributed file systems, grid computing, massively parallel processing are certainly bringing the data storage world into a new frontier, and it is something we as storage professionals have to adapt to. I am eager to learn and know more about big data. It is a big headache but change is inevitable.

NFS version 4

NFS has been around for almost 20 years and is the de-facto network file sharing protocol for Unix and Linux systems. There has been much talk about NFS, especially in version 2 and 3 and IT guys used to joke that NFS stands for “No F*king Security”.

NFS version 4 was different, borrowing ideas from Windows CIFS and Carnegie Mellon’s Andrew File System. And from its inception 11 years ago in IETF RFC 3010 (revised in 2003 with IETF RFC 3530), the notable new features of NFSv4 are:

Performance enhancement – One key enhancement is the introduction of the COMPOUND RPC procedure which allows the NFS client to group together a bunch of file operations into a single request to the NFS server. This not only reduces the network round-trip latency, but also reduces the small little chatters of the smaller file operations.

 

Removing the multiple daemons in NFSv3 – NFSv3 uses various daemons/services and various protocols to do the work. There isportmapper (TCP/UDP 111) which provides the port numbers for mounting and NFS services. There’s the mountd (arbitrary port byportmapper) service that does the mounting of NFS exports for the NFS clients. There’s nfsd on TCP/UDP 2049and so on. The command ‘rpcinfo -p’ below shows all the ports and services related to NFS

 

There are other features as well such as

Firewall friendly – The use ofportmapper dishing out arbitrary ports made it difficult for the firewall. NFSv4 changed that by consolidating most of the TCP/IP services into well-known ports which the security administrator can define in the firewall.

 

Stateful – NFSv3 is stateless and it does not maintain the state of the NFS clients. NFSv4 is stateful and implements a mandatory locking and delegation mechanisms. Leases for locks from the servers to the clients was introduced. A lease is a time-bounded grant for the control of the state of a file and this is implemented through locks.

Mandated Strong Security Architecture – NFSv4 requires the implementation of a strong security mechanism that is based on crytography. Previously the strongest security flavour was AUTH_SYS, which is level 1 clearance.

 "The AUTH_SYS security flavor uses a host-based authentication model
where the client asserts the user's authorization identities using small
integers as user and group identity representations (this is EXACTLY how NFSv3
authenticates by default). Because of the small integer authorization ID
representation, AUTH_SYS can only be used in a name space where all clients and
servers share a uidNumber and gidNumber translation service. A shared translation
service is required because uidNumbers and gidNumbers are passed in the RPC
credential; there is no negotiation of namespace in AUTH_SYS."

NFSv4 security mechanism is based on RPCSEC_GSS, a level 6 clearance. RPCSEC_GSS is an API that is more than an authentication mechanism. It performs integrity checksum and encryption in the entire RPC request and response operations. This is further progressed with the integration of Kerberos v5 for user authentication. This is quite similar to Windows CIFS Kerberos implementation, providing a time-based ticket to gain authentication.

In addition to that, there are many other cool, new features with NFSv4. There was a further extension to NFSv4 last year, in 2010, when NFSv4.1 was added in IETF RFC5661. As quoted in Wikipedia – NFSv4.1 “aims to provide protocol support to take advantage of clustered server deployments including the ability to provide scalable parallel access to files distributed among multiple servers (pNFS extension).

NFSv4 has much to offer. The future is bright.

Go on and be a storage extraordinaire

ex·tra·or·di·naire – Outstanding or remarkable in a particular capacity

I was plucking the Internet after dinner while I am holidaying right now in Port Dickson. And at about this time, the news from my subscriptions will arrive, perfectly timed as my food is digesting.

And in the news – “IDC Says Cloud Adoption Fuels Storage Sales”. You think?

We are generating so much data in this present moment, that IDC is already saying that we are doubling our data every 2 years. That’s massive and a big part of it is being fueled our adoption to Cloud. It doesn’t matter if it is a public, private or hybrid cloud because the way we use IT has changed forever. It’s all too clear.

Amazon has a massive repository of contents; Google has been gobbling tons of data and statistics since its inception; Apple has made IT more human; and Facebook has changed the way we communicate. FastCompany magazine called Amazon, Apple, Google and Facebook the Big 4 and they will converge sooner or later into what the tornado chasers call a Perfect Storm. Every single effort that these 4 companies are doing now will inevitably meet at one point, where content, communication, computing, data, statistics all become the elements of the Perfect Storm. And the outcome of this has never been more clearer. As FastCompany quoted:

“All of our activity on these devices produces a wealth of data, which leads to the third big idea underpinning their vision. Data is like mother’s milk for Amazon, Apple, Facebook, and Google. Data not only fuels new and better advertising systems (which Google and Facebook depend on) but better insights into what you’d like to buy next (which Amazon and Apple want to know). Data also powers new inventions: Google’s voice-recognition system, its traffic maps, and its spell-checker are all based on large-scale, anonymous customer tracking. These three ideas feed one another in a continuous (and often virtuous) loop. Post-PC devices are intimately connected to individual users. Think of this: You have a family desktop computer, but you probably don’t have a family Kindle. E-books are tied to a single Amazon account and can be read by one person at a time. The same for phones and apps. For the Fab Four, this is a beautiful thing because it means that everything done on your phone, tablet, or e-reader can be associated with you. Your likes, dislikes, and preferences feed new products and creative ways to market them to you. Collectively, the Fab Four have all registered credit-card info on a vast cross-section of Americans. They collect payments (Apple through iTunes, Google with Checkout, Amazon with Amazon Payments, Facebook with in-house credits). Both Google and Amazon recently launched Groupon-like daily-deals services, and Facebook is pursuing deals through its check-in service (after publicly retreating from its own offers product).”

Cloud is changing the way we work, we play, we live and data is now the currency of humans in the developed and developing worlds. And that is good news for us storage professionals, because all the data has to eventually end up in a storage somewhere, somehow.

That is why there is a strong demand for storage networking professionals. Not just any storage professionals but the ones that have the right attitude to keep developing themselves, enhancing their skillset, knowledge and experience. The ones that can forsee that the future will worship them and label them as deities of the Cloud era.

So why are you guys take advantage of this? Well, don’t just sit there and be ordinary. Be a storage extraordinaire now! And for those guys who want to settle of being ordinary … too bad! I said this before – You could lose your job.

Happy school holidays!

Does all SSDs make sense?

I have been receiving a lot of email updates from Texas Memory Systems for many months now. I am a subscriber to their updates and Texas Memory System is the grand daddy of flash and DRAM-based storage systems. They are not cheap but they are blazingly fast.

Lately, more and more vendors have been coming out with all SSDs storage arrays. Startups such Pure Storage, Violin Memory and Nimbus Data Systems have been pushing the envelope selling all SSDs storage arrays. A few days ago, EMC also announced their all SSDs storage array. As quoted, the new EMC VNX5500-F utilizes 2.5-in, single-level cell (SLC) NAND flash drives to 10 times the performance of the hard-drive based VNX arrays. And that is important because EMC has just become one of the earliest big gorillas to jump into the band wagon.

But does it make sense? Can one justify to invest in an all SSDs storage array?

At this point, especially in this part of the world, I predict that not many IT managers are willing to put their head on the chopping board and invest in an all SSDs storage array. They would become guinea pigs for a very expensive exercise and the state of the economy is not helping. Therefore the automatic storage tiering (AST) might stick better than having an all SSDs storage array. The cautious and prudent approach is less risky as I have mentioned in a past blog.

I wrote about Pure Storage in a previous blog and the notion that SSDs will offer plenty of IOPS and throughput. If the performance gain translates into higher productivity and getting the job done quicker, then I am all for SSDs. In fact, given the extra performance numbers

There is no denying that the fact that the industry is moving towards SSDs and it makes sense. That day will come in the near future but not now for customers in these part of the world.

Ocarina rising

After more than a year since Dell acquired Ocarina Networks, it has finally surfaced last week in the form of Dell DX Object Storage 6000G SCN (Storage Compression Node).

Ocarina is a content-aware storage optimization engine, and their solution is one of the best I have seen out there. Its unique ECOsystem technology, as described in the diagram below, is impressive.

Unlike most deduplication and compression solutions out there, Ocarina Networks solution takes storage optimization a step further.  Ocarina works at the file level and given the rise and crazy, crazy growth of unstructured files in the NAS space, the web and the clouds, storage optimization is one priority that has to be addressed immediately. It takes a 3-step process – Extract, Correlate and Optimize.

Today’s files are no longer a flat structure of a single object but more of a compounded file where many objects are amalgamated from different sources. Microsoft Office is a perfect example of this. An Excel file would consists of objects from Windows Metafile Formats, XML objects, OLE (Object Linking and Embedding) Compound Storage Objects and so on. (Note: That’s just Microsoft way of retaining monopolistic control). Similarly, a web page is a compound of XML, HTML, Flash, ASP, PHP object codes.

In Step 1, the technology takes files and breaks it down to its basic components. It is kind of like breaking apart every part of a car down to its nuts and bolt and layout every bit on the gravel porch. That is the “Extraction” process and it decodes each file to get the fundamental components of the files.

Once the compounded file object is “extracted”, identified and indexed, each fundamental object is Correlated in Step 2. The correlation is executed with the file and across files under the purview of Ocarina. Matching and duplicated objects are flagged and deduplicated. The deduplication is done at the byte-level, unlike most deduplication solutions that operate at the block-level. This deeper and more granular approach further reduces the capacity of the storage required, making Ocarina one of the most efficient storage optimization solutions currently available. That is why Ocarina can efficiently reduce the size of even zipped and highly encoded files.

It takes this storage optimization even further in Step 3. It applies content-aware compactors for each fundamental object type, uniquely compressing each object further. That means that there are specialized compactors for PDF objects, ZIP objects and so on. They even have compactors for Oil & Gas seismic files. At the time I was exposed to Ocarina Networks and evaluating it, it had about 600+ unique compactors.

After Dell bought Ocarina in July 2010, the whole Ocarina went into a stealth mode. Many already predicted that the Ocarina technology would be integrated and embedded into Dell’s primary storage solutions of Compellent and EqualLogic. It is not there yet, but will likely be soon.

Meanwhile, the first glimpse of Ocarina will be integrated as a gateway solution to Dell DX6000 Object Storage. DX Object Storage is a technology which Dell has OEMed from Caringo. DX6000 Object Storage (I did not read in depth) has the concept of the old EMC Centera, but with a much newer, and more approach based on XML and HTTP REST. It has published an open API and Dell is getting ISV partners to develop their applications to interact with the DX6000 including Commvault, EMC, Symantec, StoredIQ are some of the ISV partners working closely with Dell.

(24/10/2011: Editor note: Previously I associated Dell DX6000 Object Storage with Exanet. I was wrong and I would like to thank Jim Dtuton of Caringo for pointing out my mistake)

Ocarina’s first mission is to reduce the big, big capacities in Big Data space of the DX6000 Object Storage, and the Ocarina ECOsystem technology looks a good bet for Dell as a key technology differentiator.

Dropbox – everyone literally dropping their pants

I am not a DropBox user (yet)

But as far as users habits are concerned, Dropbox is literally on fire, and everyone is basically dropping their pants for them. Why? Because Dropbox solves a need that everyone of us has, and have been hoping someone else had a solution for it.

It all started when the founder, Drew Houston, was on a bus ride from Boston to New York. He wanted to work on the 4-hour bus journey, and he had his laptop. Unfortunately, he forgot his thumb drive where his work was and the Dropbox idea was born. Drew wrote some codes to allow him to access his files anywhere, with any device and as they say, “Necessity is the mother of invention”. And it did.

Together with his fellow MIT student, Arash Fedowsi, Drew Houston work on the idea and got funding after that. With a short history about 4 years, it has accumulated about 40 million users by June of 2011. They based their idea of “freemium”, a business model that works by offering a product or service free of charge (typically digital offerings such as software, content, games, web services or other) while charging a premium for advanced features, functionality, or related products and services. And it’s catching like wildfire.

So, how does Dropbox work? In my usual geeky ways, the diagram below should tell the story.

The Dropbox service works flawlessly with MacOS, Windows and Linux. And it has client apps for Apple iOS and Google Android. The copy of the files can be accessible anywhere by almost any device and this simplicity is what the beauty of Dropbox is all about.

In a deeper drive, Dropbox clients basically communicate with the Dropbox server/service in the “cloud” from literally anywhere. The requests for opening a file, reading or writing to it rides on the RESTful cacheable communication protocol encapsulated in the HTTP services. For more info, you can learn about the Dropbox API here.

More about Dropbox in the YouTube video below:

One of the concerns of the cloud is security and unfortunately, Dropbox got hit when they were exposed by a security flaw in June 2011. Between a period of almost 4 hours, after a Dropbox maintenance upgrade, a lot of users’ folders were viewable by everyone else. That was scary but given the freemium service, that is something the users have to accept (or is it?)

This wildfire idea is beginning to take shape in the enterprises as well, with security being the biggest things to address. How do you maintain simplicity and make the users less threatened but at the same time, impose security fences, data integrity and compliance for corporate responsibility? That’s the challenge IT has to face.

Hence, necessity is the mother of invention again. Given the requirement of enterprise grade file sharing and having IT to address the concerns about security, integrity, controls, compliance and so on and not to mention the growth magnitude of files in the organization, Novell, which I had mentioned in my earlier blog, will be introducing something similar by early next year in 2012. This will be the security-enhanced, IT-controlled, user-pleasing file sharing and file access solution called Novell Filr. There’s a set of presentation slides out there.

We could see the changing of the NAS landscape as well because the user experience is forcing IT to adapt to the changes. Dropbox is one of the pioneers in this new market space and we will see more copy-cats out there. What’s more important now is how the enterprise NAS will do the address this space?

A wizer IBM

A couple nights ago, IBM launched a slew of new storage technology updates and a new cloud service called SmartCloud Enterprise, which incorporates some cloud technology from Nirvanix.

There were updates to IBM XIV, SVC, SONAS and also the DS8800 and the announcement reached us with a big bang. One of the notable updates that caught my eye was IBM Storwize V7000. When IBM first acquired Storwize in 2010, their solution was meant to be a compression engine in front of a NAS storage. And it pretty much of that for a while, until the new Storwize V7000.

The new Storwize V7000 is now a Unified Storage array, a multiprotocol box that IBM has quoted to compete with EMC VNX series. In the news, the V7000 has the block virtualization code from the IBM SVC, files support, a file distribution policy engine called ActiveCloud, and also included remote replication (Metro & Global Mirror), automatic storage tiering (EasyTier), clustering and storage virtualization as well. It also sports a new user interface inherited from IBM XIV’s Gen3 GUI that can manage both files and blocks.

The video below introduces the V7000:

While IBM is being courteous to NetApp (NetApp FAS series are IBM’s N-Series) by saying that their cannons are pointed towards EMC’s VNX, one cannot help to question the strong possibility of the V7000 hurting N-series sales as well. NetApp could see this relationship sailing choppy waters ahead.

To me, the current IBM storage technology lineup is staggered. It is everything to everyone, and there are things that are in need of sharpening. HDS has certainly made great leaps getting their act together and they have gained strong market share in the past 2 quarters. Dell and HP have not been so good, because their story just don’t gel well. It’s about time IBM get going with their own technology, and more importantly consolidate their storage technology lineup into a more focused strategy.

This is a great announcement for IBM and they are getting wizer!