Chink in NetApp MetroCluster?

Ok, let me clear the air about the word “Chink” (before I get into trouble), which is not racially offensive unlike the news about ESPN having to fire 2 of their employees for using the word “Chink” on Jeremy Lin.  According to my dictionary (Collins COBUILD), chink is a very narrow crack or opening on a surface and I don’t really know the derogatory meaning of “chink” other than the one in my dictionary.

I have been doing a spot of work for a friend who has just recently proposed NetApp MetroCluster. When I was at NetApp many years ago, I did not have a chance to get to know more about the solution, but I do know of its capability. After 6 years away, coming back to do a bit of NetApp was fun for me, because I was always very comfortable with the NetApp technology. But NetApp MetroCluster, and in this opportunity, NetApp Fabric MetroCluster presented me an opportunity to get closer to the technology.

I have no doubt in my mind, this is one of the highest available storage solutions in the market, and NetApp is not modest about beating its own drums. It touts “No SPOF (Single Point of Failure“, and rightly so, because it has put in all the right plugs for all the points that can fail.

NetApp Fabric MetroCluster is a continuous availability solution that stretches over 100km. It is basically a NetApp Cluster with mirrored storage but with half of  its infrastructure mirror being linked very far apart, over Fibre Channel components and dark fiber. Here’s a diagram of how NetApp Fabric Metrocluster works for a VMware FT (Fault Tolerant) environment.

There’s a lot of simplicity in the design, because when I started explaining it to the prospect, I was amazed how easy it was to articulate about it, without all the fancy technical jargons or fuzz. I just said … “imagine a typical cluster, with an interconnect heartbeat, and the storage are mirrored. Then imagine the 2 halves are being pulled very far apart … That’s NetApp Fabric MetroCluster”. It was simply blissful.

But then there were a lot of FUDs (fear, uncertainty, doubt) thrown in by the competitor, feeding the prospect with plenty of ammunition. Yes, I agree with some of the limitations, such as no SATA support for now. But then again, there is no perfect storage solution. In fact, Chris Mellor of The Register wrote about God’s box, the perfect storage, but to get to that level, be prepared to spend lots and lots of money! Furthermore, once you fix one limitation or bottleneck in one part of the storage, it introduces a few more challenges here and there. It’s never ending!

Side note: The conversation triggered the team to check with NetApp for SATA support in Fabric MetroCluster. Yes, it is already supported in ONTAP 8.1 and the present version is 8.1RC3. Yes, SATA support will be here soon. 

More FUDs as we went along and when I was doing my research, some HP storage guys on the web were hitting at NetApp MetroCluster. Poor HP! If you do a search of NetApp MetroCluster, I am sure you will come across these 2 HP blogs in 2010, deriding the MetroCluster solution. Check out this and the followup on the first blog. What these guys chose to do was to break the MetroCluster apart into 2 single controllers after a network failure, and attack it from that level.

Yes, when you break up the halves, it is basically a NetApp system with several single point of failure (SPOF). But then again, who isn’t? Almost every vendor’s storage will have some SPOFs when you break the mirror.

Well, I can tell you is, the weakness of NetApp MetroCluster is, it’s not continuous data protection (CDP). Once your applications have written garbage on one volume, the garbage is reflected on the mirrored volume. You can’t roll back and you live with the data corruption. That is why storage vendors, including NetApp, offer snapshots – point-in-time copies where you can roll back to the point before the data corruption occurred. That is why CDP gives the complete granularity of recovery in every write I/O and that’s something NetApp does not have. That’s NetApp’s MetroCluster weakness.

But CDP is aimed towards data recovery, NOT data availability. It is focused on customers’ whose requirements are ability to get the data back to some usable state or form after the event of a disaster (big or small), while the MetroCluster solution is focused on having the data available all the time. They are 2 different set of requirements. So, it depends on what the customer’s requirement is.

Then again, come to think of it, NetApp has no CDP technology of their own … isn’t it?

Tagged , , , , , . Bookmark the permalink.

About cfheoh

I am a technology blogger with 30 years of IT experience. I write heavily on technologies related to storage networking and data management because those are my areas of interest and expertise. I introduce technologies with the objectives to get readers to know the facts and use that knowledge to cut through the marketing hypes, FUD (fear, uncertainty and doubt) and other fancy stuff. Only then, there will be progress. I am involved in SNIA (Storage Networking Industry Association) and between 2013-2015, I was SNIA South Asia & SNIA Malaysia non-voting representation to SNIA Technical Council. I currently employed at iXsystems as their General Manager for Asia Pacific Japan.

14 Responses to Chink in NetApp MetroCluster?

  1. Pingback: Chink in NetApp MetroCluster? « Storage Gaga

  2. jgan says:

    Was it 10 years ago? Kinda remember seing the NetApp brochures on this one. 🙂

    • cfheoh says:

      10 years? Did I mentioned anything about 10 years ago? This product was introduced in 2005 as I recall.

      /Chin Fah

  3. storage rules says:

    “Yes, when you break up the halves, it is basically a NetApp system with several single point of failure (SPOF). But then again, who isn’t?”

    Well, many vendors including the one you have derided – HP storage isn’t. If you e.g. use HP 3Par or EVA or XP with their ClusterExtention for Windows software, you get a solution that has no SPOF within the data center and automatic failover between data centers. This is because unlike NetApp metrocluster they don’t stretch the controllers between the two locations which gives you no SPOF i.e. High Availability within the data center and Disaster Tolerance across data centers.

    • cfheoh says:


      Sorry for the late reply. An avalanche of work in the past few weeks.

      I would like to set the record straight about what I wrote. I am well aware of HP’s HA and Cluster offerings as well other solutions in the market. But if you take the time to read what the HP blogger wrote, what he did was to break apart the NetApp MetroCluster solution and started deriding it as a single half, pointing out SPOF in one half of the MetroCluster.

      That was the reason why I wrote about the comment “when you break up the halves, it is basically a NetApp system with several single point of failure (SPOF). But then again, who isn’t?” Perhaps the “who isn’t” part sounded to you that I do not know my industry well, and perhaps it sounded shallow. That was exactly how I felt about the HP blogger’s article when I read it.

      Either or, and whatever your opinions are, I thank you for your input.


  4. Chr1g1 says:

    NetApp SE here. Thanks for your nice words.
    Actually, we have thousands of active Metroclusters installed. The solution is just awesome and it saves a lot of money too, by stretching out one cluster pair. It is a real pain for our competitors, that’s might be the main reason they spreading FUDs about it and try to find flaws that are not really relevant in daily business life. In fact the mirroring is so easy to manage that it is just a no-brainer in terms of management too.
    SATA is supported as of ONTAP 8.1 and we have already many systems running with it.
    Comments that Metroclusters are not non-disruptively upgradeable when there are a high number of snapshots present are just none sense. We upgraded systems with 2500 LUNs and thousands of snapshots completely non-disruptively while running at 80% CPU.
    You should get one too 😉

    • cfheoh says:

      Hi Chrigi

      Thanks for your comments.

      I am aware of what MetroCluster can do and there aren’t many like it. I was doing some work for a NetApp partner last month, and the competition was IBM. The amount of FUDs that came from IBM about their V7000 was the reason why I wrote my blog entry that day.

      I intend to correct that, by sharing information that matters, not FUDS and hypes. I try my best to be impartial in what I write, but at times, my heart is still with NetApp. FYI, I am an ex-NetAppian, worked for them for 6 years in the South Asia region. I am still very passionate about their SE culture, their passion of being innovators in the storage industry and most of all, being the “little engine that could”.

      Thanks and all the best to you!


    • Sergey says:

      You expierence is very interesting for me. We actually in process of selection new storage for tier 1. One of options is Metrocluster based on new FAS3250. I’m looking for “bad cases” with metrocluster. Do you have an expierence when this solution was unhelpful for data availability (disasters, chushes etc.)

      • cfheoh says:

        Hello Sergey,

        The MetroCluster implementations and support I have been involved in did not have “bad cases” as far as I know. If the implementation is done to the dot and engagement with NetApp PS, they will go like clockwork.

        However, MetroCluster is not for everybody. There is a high cost of investment and there are times where new acquisitions budget are better used if the RPO objectives are lower. MC is for *remote continuous data availability* over distances about 100-200km. There are limitations and boundaries.

        If your RPO objectives can tolerate RPO of 15-30 min, then an IP-based replication solution is good enough. This is usually coupled with a single view of the 2 replicated copies to maintain the almost seamless continuity of data access from either data volumes.

        Hope this helps.

        Thank you

        • Sergey says:

          Thank you for answer.
          We planing to use MetroCluster as Tier 1 storage for data and systems where RPO closed to 0. For instance, payment cards processing systems.
          Maybe you know some special poins about MC, which must be considered in process of planing and implementation of MC.
          By the way, in our case as competitve solution we consider HITACHI HUS-VM

  5. L8Nico says:

    Would you mind looking into how Coraid is doint this?

    • cfheoh says:

      Hi Niklas,

      Sure, I am linked up with Coraid guys at LinkedIn. I can ask them. Will keep you posted.


  6. Jim says:

    SATA works very nicely with the MetroCluster now.. there is a ATTO FibreBridge 6500N now in place of where you showed the Brocade 200E in your topology drawing. Love the fact that storage costs are a bit lower, although as you say there are points of weakness and failure in just about any solution. Get me God’s box and I will be ok!

    • cfheoh says:

      Hi Jim

      Thanks for your comments. Yes, I have not gotten around updating Netapp’s new updates, and I am fully aware of the SATA support as well as the Shared Fabric validation as well. This has further reduced costs by allowing 4 switches to support 2 sets of MetroCluster configurations.

      All the best to you!


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.