I have always wanted to look deeper into OpenStack, but I never got around to it. However, last week, something about NASA and OpenStack caught my attention … something about NASA pulling out of OpenStack development.
The spin was that “OpenStack has come on its own” is true, because OpenStack today has 180 (at last count on June 20th 2012) companies participating and contributing to the development, deployment and marketing of the highly popular Infrastructure-as-a-Service cloud computing project. So, the NASA withdrawal was not as badly felt as to what NASA had said next.
When NASA CIO Linda Cureton announced that NASA has shifted to Amazon Web Services (AWS) for their enterprise cloud-based infrastructure and they have saved almost a million dollars in costs, that was a clear and blatant impalement to the very heart and soul of OpenStack. NASA, one of the 2 founders of OpenStack in 2009, has switched sides to announce their preference to OpenStack’s rival, AWS. It pains me to just listen to the such a defection. Continue reading →
VMware is not a panacea for all your server virtualization requirements but because they do fantastic marketing (not to mention doing 1 small seminar every 1.5-2 months here in Malaysia last year), everyone thinks they are the only choice for server virtualization.
Efforts from Citrix Xen, Microsoft Hyper-V and RedHat Virtualization do not seem to make a dent into VMware’s armour and it is beginning to feel that VMware is the only choice for server virtualization. However, every new server virtualization proposal would end up with the customer buying a brand new, much more powerful server. More CPUs, more cores, and more RAM (I am not going into VMware vRAM licensing issues here but customers know they are caged-in).
You see, VMware’s style of server virtualization is a in-system virtualization. The amount of physical resources within the system are being pooled, virtualized and shared with the virtual machines (VMs) in the physical chassis. With exception to the concept of distributed vSwitches (dvSwitch), CPUs, processing CPU cores and RAM are pretty much confined within what’s available in the physical box in most server virtualization environment. You can envision the concept of VMware’s in-system virtualization in the diagram below:
So, the consolidation (and virtualization) phase of older physical servers would involve packing tons of CPU cores and tons of RAMs in a newer, high end server.
I just visited a prospect a few days ago. For about 30 users for an ERP system and perhaps 100 users of Zimbra mailboxes, he lamented that he had to invest into 2 Dell R710 servers with 64GB of RAM each and sporting 2 x 8-core Intel Xeon. That sounded to like an overkill but that is what is happening here in this part of the world. The customer is given the perception and the doubt of inadequacy when they virtualize their servers. “What if I don’t have enough cores?; what if I don’t have enough RAM?” That in itself is the typical Malaysian (and Singaporean) kiasu mentality. Check out the Wikipedia definition of kiasu here.
Such a high-end server costs a lot of moolahs. And furthermore, the scalability and performance of the virtualized servers in the VMs are trapped within how much these servers can scale physically. If the server is maxed out at 16-cores and 128GB of RAM, then the customer to upgrade again with a server forklift. That’s not good.
And one more thing. VMware server virtualization is not ready for High Performance Computing (HPC) …yet.
Let’s look at this in another way. Let’s assume that you can look the server virtualization approach in an outward manner rather than the inward within kind of thinking, like the VMware in-system method.
What if you can invest in lower-end x86 servers with 1 x quad-core CPUs, with 8GB of RAM? What if you can put aggregate many of these lower-end servers together and build a large cluster of lower-end x86 servers into a huge symmetric multiprocessing server farm that supports 1,024 CPUs of 16,384 cores, 64TB of RAM? Have a look at this video that explains what I just mentioned:
Yeah, yeah .. it’s a marketing video from ScaleMP. But I am looking beyond the company and looking at the possibility of this out-system type of server virtualization. The ability to pool together all the CPU processing power of many physical servers and the aggregation of physical RAMs of all the combined servers into a single shared memory architectureunleashes the true power of server virtualization. This is THE next generation symmetric multiprocessing (SMP) architecture, and it breaks free from the limitations and scalability the in-ward virtualization of physical servers.
In the past, SMP system rely on heavy programmability of the applications to scale with SMP systems. Applications didn’t necessary scale on-the-fly with SMP systems, and some level of configuration and programming have to be applied to address the proprietary SMP methods and interconnects. ScaleMP’s vSMP Foundationhypervisor solution removes the proprietary nature of SMP and bringing x86 server virtualization to meet the demands of HPC.
Here’s a look at the high level architecture of ScaleMP vSMP:
This type architecture brings similarity to RNA Networks solutions that I blogged some time ago. RNA Network, which was acquired by Dell late last year, based their solution on the RDMA technology and protocol, and was more about enhancing scalability and performance with memory pooling via Memory Cloud. ScaleMP’s patent-pending technology is more than that. It pools both memory and processing cores as well, giving it greater scalability and performance, the much needed resources for the demands of HPC environments.
The folks at ScaleMP contacted me a couple of weeks back and shared some of their marketing datasheets and whitepapers. While the information passed to me were OK, I wish the information could have a deeper dive into the technology and implementation as well. I hope they could share it, and I don’t mind signing an NDA.
Well, this is done pro bono, because I want everyone to know the choices and possibilities out there. It is my worldly cause to have people educated because only by being informed, we make better choices. The server virtualization world isn’t always about VMware, you know.
It is kind of interesting when every vendor out there claims that they are as open as they can be but the very reality is, the competitive nature of the game is really forcing storage vendors to speak open, but their actions are certainly not.
Confused? I am beginning to see a trend … a trend that is forcing customers to be locked-in with a certain storage vendor. I am beginning to feel that customers are given lesser choices, especially when the brand of the server they select for their applications will have implications on the brand of storage they will be locked in into.
And surprise, surprise, SSDs are the pawns of this new cloak-and-dagger game. How? Well, I have been observing this for quite a while now, and when HP announced their SMART portfolio for their storage, it’s time for me to say something.
In the announcement, it was reported that HP is coming out with its 8th generation ProLiant servers. As quoted:
“The eighth generation ProLiant is turbo-charging its storage with a Smart Array containing solid state drives and Smart Caching.”
“It also includes two Smart storage items: the Smart Array controllers and Smart Caching, which both feature solid state storage to solve the disk I/O bottleneck problem, as well as Smart Data Services software to use this hardware”
From the outside, analysts are claiming this is a reaction to the recent EMC VFCache product. (I blogged about it here) and HP was there to put the EMC VFcache solution as a first generation product, lacking the smarts (pun intended) of what the HP products have to offer. You can read about its performance prowess in the HP Connect blog.
The idea is very obvious. Put in a PCIe-based flash caching card in the server, and use a condescending caching/tiering technology that ties the server to a certain brand of storage. Only with this card, that (incidentally) works only with this brand of servers, will you, Mr. Customer, be able to take advantage of the performance power of this brand of storage. Does that sound open to you?
HP is doing it with its ProLiant servers; Dell is doing it with its ExpressFlash; EMC’s VFCache, while not advocating any brand of servers, is doing it because VFCache works only with EMC storage. We have seen Oracle doing it with Oracle ExaData. Oracle Enterprise database works best with Oracle’s own storage and the intelligence is in its SmartScan layer, a proprietary technology that works exclusively with the storage layer in the Exadata. Hitachi Japan, with its Hitachi servers (yes, Hitachi servers that we rarely see in Malaysia), already has such a technology since the last 2 years. I wouldn’t be surprised that IBM and Fujitsu already have something in store (or probably I missed the announcement).
NetApp has been slow in the game, but we hope to see them coming out with their own server-based caching products soon. More pure play storage are already singing the tune of SSDs (though not necessarily server-based).
The trend is obviously too, because the messaging is almost always about storage performance.
Yes, I totally agree that storage (any storage) has a performance bottleneck, especially when it comes to IOPS, response time and throughput. And every storage vendor is claiming SSDs, in one form or another, is the knight in shining armour, ready to rid the world of lousy storage performance. Well, SSDs are not the panacea of storage performance headaches because while they solve some performance issues, they introduce new ones somewhere else.
But it is becoming an excuse to introduce storage vendor lock-in, and how has the customers responded this new “concept”? Things are fairly new right now, but I would always advise customers to find out and ask questions.
Cloud storage for no vendor lock-in? Going to the cloud also has cloud service provider lock-in as well, but that’s another story.
Admit it! You are a terabyte junkie! I am sure many of us have one terabyte or more of your personal “stuff” at home. Heck, I even heard from a friend that he has almost 20TB of high definition movies (thank you Torrent!) at home! That’s crazy!
And what the typical Malaysian consumer would do after he or she runs out of hard disk space? In KL (our beloved capital city, Kuala Lumpur), they would throng the Low Yat IT mall or extensions of it, like Digital Mall in PJ Section 14. In other towns and cities in Malaysia, PC fairs are popular, as consumers try to get the best price possible (We Malaysian are good at squeezing the max of a deal)
It is difficult for the not-so-IT-literate consumer to differentiate which brand is the best. Buffalo, Iomega, DLink, Western Digital, etc, etc. But the tides are changing, because these vendors want to tie you down for the rest of your digital life. You see, buying a small NAS for the home now comes with a big carrot, an incentive to keep you wanting for more, and yet you can’t unbind yourself from the tether once you are hooked.
Cloud storage hasn’t taken off in a big way last year. But many cloud storage vendors know there are plenty of opportunities out there but how do they get the consumers to upload their files, photos and whatever stuff they might have, to cloud storage? Ingeniously, they work together with other smaller NAS storage players and use these vendor’s product offerings as baits. They bundle a significantly large FREE capacity or data protection offering in the Cloud Storage as the carrot, and once the consumer decides to put their files in the cloud storage, boom, they are ensnared to become a long term ATM machine to the Cloud Storage Provider.
Sneaky? No? I call this good, smart marketing. You have a market of opportunities out there, but cloud storage isn’t catching on. You have small NAS vendors that is reaching out to the market of consumer, but it’s a brutal, competitive arena and margins are razor thin. It’s a win-win situation for both sides.
This was moving towards a market that scratches the itch. The consumers wanted reliable backup too, but consumer-grade disk drives fail ever so often. Laptops get stolen, and files could be infected by viruses. The list goes on, but the point is that the Cloud Storage Providers may have found a silver lining in getting the consumers to leap into the cloud. And the whole idea of small NAS vendor-big Cloud Backup dynamic duo, just got a big endorsement last night. Guess who has decided to dip its grubby hands into the pie?
EMC, the 800-pound gorilla of the information and storage world, through its Iomega subsidiary, wants your money! They had just married Iomega with EMC Atmos. It was quoted:
“EMC subsidiary and data protection specialist Iomega announced the integration between Iomega network storage solutions and EMC Atmos, extending Atmos cloud-based data protection and sharing to Iomega’s network storage product offerings. The new integration gives small and midsize businesses (SMBs), remote offices and distributed enterprises access to any Atmos powered cloud around the world.”
Surprised? Not really, but I guess EMC needs to breath new life into Atmos and this marriage just extended Atmos’ life support system.
The next all-Flash product in my review list is SolidFire. Immediately, the niche that SolidFire is trying to carve out is obvious. It’s not for regular commercial customers. It is meant for Cloud Service Providers, because the features and the technology that they have innovated are quite cloud-intended.
Are they solid (pun intended)? Well, if they have managed to secure a Series B funding of USD$25 million (total of USD$37 million overall) from VCs such as NEA and Valhalla, and also angel investors such as Frank Slootman (ex-Data Domain CEO) and Greg Papadopoulus(ex-Sun Microsystems CTO), then obviously there is something more than meets the eye.
The one thing I got while looking up SolidFire is there is probably a lot of technology and innovation behind their Nodes and their Element OS. They hold their cards very, very close to their chest, and I couldn’t not get much good technology related information from their website or in Google. But here’s a look of how the SolidFire is like:
The SolidFire only has one product model, and that is the 1U SF3010. The SF3010 has 10 x 2.5″ 300GB SSDs giving it a raw total of 3TB per 1U. The minimum configuration is 3 nodes, and it scales to 100 nodes. The reason for starting with 3 nodes is of course, for redundancy. Each SF3010 node has 8GB NVRAM and 72GB RAM and sports 2 x 10GbE ports for iSCSI connectivity, especially when the core engineering talents were from LeftHand Networks. LeftHand Networks product is now HP P4000. There is no Fibre Channel or NAS front end to the applications.
Each node runs 2 x Intel Xeon 2.4GHz 6-core CPUs. The 1U height is important to the cloud provider, as the price of floor space is an important consideration.
Aside from the SF3010 storage nodes, the other important ingredient is their SolidFire Element OS.
Cloud storage needs to be available. The SolidFire Helix Self-Healing data protection is a feature that is capable of handling multiple concurrent failures across all levels of their storage. Data blocks are replicated randomly but intelligently across all storage nodes to ensure that the failure or disruption of access to a particular data block is circumvented with another copy of the data block somewhere else within the cluster. The idea is not new, but effective because solutions such as EMC Centera and IBM XIV employ this idea in their data availability. But still, the ability for self-healing ensures a very highly available storage where data is always available.
To address the efficiency of storage, having 3TB raw in the SF3010 is definitely not sufficient. Therefore, the Element OS always have thin provision, real-time compression and in-line deduplication turned on. These features cannot be turned off and operate at a fine-grained 4K blocks. Also important is the intelligence to reclaim of zeroed blocks, no-reservation, and no data movement in these innovations. This means that there will be no I/O impact, as claimed by SolidFire.
But the one feature that differentiates SolidFire when targeting storage for Cloud Service Providers is their guaranteed volume level Quality of Service (QOS). This is important and SolidFire has positioned their QOS settings into an advantage. As best practice, Cloud Service Providers should always leverage the QOS functionality to improve their storage utilization
The QOS has:
Minimum IOPS – Lower IOPS means lower performance priority (makes good sense)
Maximum IOPS
Burst IOPS – for those performance spikes moments
Maximum and Burst MB/sec
The combination of QOS and storage capacity efficiency gives SolidFire the edge when cloud providers can scale both performance and capacity in a more balanced manner, something that is not so simple with traditional storage vendors that relies on lots of spindles to achieve IOPS performance sacrificing capacity in the process. But then again, with SSDs, the IOPS are plenty (for now). SolidFire does not boast performance numbers of millions of IOPS or having throughput into the tens of Gigabytes like Violin, Virident or Kaminario, but what they want to be recognized as the cloud storage as it should be in a cloud service provider environment.
SolidFire calls this Performance Virtualization. Just as we would get to carve our storage volumes from a capacity pool, SolidFire allows different performance profiles to be carved out from the performance pool. This gives SolidFire the ability to mix storage capacity and storage performance in a seemingly independent manner, customizing the type of storage bundling required of cloud storage.
In fact, SolidFire only claims 50,000 IOPS per storage node (including the IOPS means for replicating data blocks). Together with their native multi-tenancy capability, the 50,000 or so IOPS will align well with many virtualized applications, rather than focusing on a 10x performance improvement on a single applications. Their approach is more about a more balanced and spread-out I/O architecture for cloud service providers and the applications that they service.
Their management is also targeted to the cloud. It has a REST API that integrates easily into OpenStack, Citrix CloudStack and VMware vCloud Director. This seamless and easy integration, is more relevant because the CSPs already have their own management tools. That is why SolidFire API is a REST-ready, integration ready to do just that.
The power of the SolidFire API is probably overlooked by storage professionals trained in the traditional manner. But what SolidFire API has done is to provide the full (I mean FULL) capability of the management and provisioning of the SolidFire storage. Fronting the API with REST means that it is real easy to integrate with existing CSP management interface.
Together with the Storage Nodes and the Element OS, the whole package is aimed towards a more significant storage platform for Cloud Service Providers(CSPs). Storage has always been a tricky component in Cloud Computing (despite what all the storage vendors might claim), but SolidFire touts that their solution focuses on what matters most for CSPs.
CSPs would want to maximize their investment without losing their edge in the cloud offerings to their customers. SolidFire lists their benefits in these 3 areas:
Performance
Efficiency
Management
The edge in cloud storage is definitely solid for SolidFire. Their ability to leverage on their position and steering away from other all-Flash vendors’ battlezone could all make sense, as they aim to gain market share in the Cloud Service Provider space. I only wish they can share more about their technology online.
Fortunately, I found a video by SolidFire’s CEO, Dave Wright which gives a great insight about SolidFire’s technology. Have a look (it’s almost 2 hour long):
[2 hours later]: Phew, I just finished the video above and the technology is solid. Just to summarize,
No RAID (which is a Godsend for service providers)
Aiming for USD5.00 or less per Gigabyte (a good number!)
General availability in Q1 2012
Lots of confidence about the superiority of their technology, as portrayed by their CEO, Dave Wright.
I like the way Amazon is building their Cloud Computing services. Amazon Web Services (AWS) is certainly on track to become the most powerful Cloud Computing company in the world. In fact, AWS might already is. But they are certainly not resting on their laurels when they launched 2 new services in as many weeks – Amazon DynamoDB (last week) and Amazon Storage Gateway (this week).
I am particularly interested in the Amazon Storage Gateway, because it is addressing one of the biggest fears of Cloud Computing head-on. A lot of large corporations are still adamant to keep their data on-premise where it is private and secure. Many large corporations are still very skeptical about it even though Cloud Computing is changing the IT landscape in a massive way. The barrier to entry for large corporations is not something easy, but Amazon is adapting to get more IT divisions and departments to try out Cloud Computing in a less disruptive way.
The new service, is really about data storage and data backup for large corporations. This is important because large corporations have plenty of requirements for data storage and data to be backed up. And as we know, a large portion of the data stored does not need to be transactional or to be accessed frequently. This set of data is usually less frequently used, for archiving or regulatory compliance reasons, particular in the banking and healthcare industry.
In the data backup operations, the reason data is backed up is to provide a data recovery mechanism when a disaster strikes. Large corporations back up tons of data every day, weeks or month and this data only has value when there is a situation that requires data relevance, data immediacy or data recovery. Otherwise, it is just plenty of data taking up storage space, be it on disk or on tape.
Both data storage and data backup cost a lot of money, both CAPEX and OPEX. In CAPEX, you are constantly pressured to buy more storage to store the ever growing data. This leads to greater management and administration costs, both contributing heavily into OPEX costs. And I have not included the OPEX costs of floor space, power and cooling, people (training, salary, time and so on) typically adding up to 3-5x the operations costs relative to the capital investments. Such a model of IT operations related to storage cannot continue forever, and storage in the Cloud offers an alternative.
These 2 scenarios – data storage and data backup – are exactly the type of market AWS is targeting. In order to simplify and pacify large corporations, AWS introduced the Amazon Storage Gateway, that eases the large corporations to take some of their IT storage operations to the Cloud in the form of Amazon S3.
The video below shows the Amazon Storage Gateway:
The Amazon Storage Gateway is a piece of software “appliance” that is installed on-premise in the large corporation’s data center. It seamlessly integrates into the LAN and provides a SSL (Secure Socket Layer) connection to the Amazon S3. The data being transferred to the S3 is also encrypted with AES (Advanced Encryption Standard) 256-bit. Both SSL and AES-256 can give customers a sense of security and AWS claims that the implementation meets the data storage and data recovery standards used in the banking and healthcare industries.
The data storage and backup service regularly protects the customer’s data in snapshots, and giving the customer a rapid recovery platform should the customer experienced on-premise data corruption or data disruption. At the same time, the snapshot copies in the Amazon S3 can also be uploaded into Amazon EBS (Elastic Block Store) and testing or development environments can be evaluated and testing with Amazon EC2 (Elastic Compute Cloud). The simplicity of sharing and combining different Amazon services will no doubt, give customers a peace of mind, easing their adoption of Cloud Computing with AWS.
This new service starts with a 60-day free trial and moving on to a USD$125.00 (about Malaysian Ringgit $400.00) per gateway per month subscription fee. The data storage (inclusive of the backup service), costs only 14 cents per gigabyte per month. For 1TB of data, that is approximately MYR$450 per month. Therefore, minus the initial setup costs, that comes to a total of MYR$850 per month, slightly over MYR$10,000 per year.
At this point, I like to relate an experience I had a year ago when implementing a so-called private cloud for an oil-and-gas customers in KL. They were using the HP EVS (Electronic Vaulting Service) to an undisclosed HP data center hosting site in the Klang Valley. The HP EVS, which was an OEM of Asigra, was not an easy solution to implement but what was more perplexing was the fact that the customer had a poor understanding of what would be the objectives and their 5-year plan in keeping with the data protected.
When the first 3-4TB data storage and backup were almost used up, the customer asked for a quotation for an additional 1TB of the EVS solution. The subscription for 1TB was MYR$70,000 per year. That is 7x time more than the AWS MYR$10,000 per year cost! I have to salute the HP sales rep. It must have been a damn good convincing sell!
In the long run, the customer could be better off running their storage and backup on-premise with their HP EVA4400 and adding an additional of 1TB (and hiring another IT administrator) would have cost a whole lot less.
Amazon Web Services has already operating in Singapore for the past 2 years, and I am sure they are eyeing Malaysia as their regional market. Unless and until Malaysian companies offering Cloud Services know to use economies-of-scale to capitalize the Cloud Computing market, AWS is always going to be a big threat to CSP companies in Malaysia and a boon of any companies seeking cloud computing services anywhere in the world.
I urge customers in Malaysia to start questioning their so-called Cloud Service Providers if they can do what AWS is doing. I have low confidence of what the most local “cloud computing” companies can deliver right now. I hope they stop window dressing their service offerings and start giving real cloud computing services to customers. And for customers, you must continue to research and find out more which cloud services meet your business objectives. Don’t be flashed by the fancy jargons or technical idealism thrown at you. Always, always find out more because your business cost is at stake. Don’t be like the customer who paid MYR$70,000 for 1TB per year.
AWS is always innovating and the Amazon Storage Gateway is just another easy-to-adopt step in their quest for world domination.
When someone as important and as prominent as Jason Hoffman reads and follows your blog, you tend to stand up and take notice. I found out last week that Jason Hoffman, Founder and CTO of Joyent, was doing just that, I was deeply honoured and elated.
My Asian values started kicking in and I felt that I should reciprocate his gracious visits with a piece on Joyent. I have known about Joyent, thanks to Bryan Cantrill as the VP of Engineering because I am bloody impressed with his work with DTrace. And I have followed Joyent’s announcements every now and then, even recommending a job that was posted on Joyent’s website for a Service Delivery Manager in Asia Pacific for my buddy a couple of months ago. He’s one of the best Solaris engineers I have ever worked with but the problem with techies is, they tend to wait for everything to fall into place before they do the next thing. Too methodical!
I took some time over the weekend to understand a bit more about Joyent and their solution offerings. They are doing some mighty cool stuff and if you are Unix/Linux buff/bigot like me, you would be damn impressed. For those people who has experienced Unix and especially Solaris, there is an unexplained element that describes the fire and the passion of such a techie. I was feeling all the good vibes all over again.
Unfortunately, Joyent is not well known in this part of the world but I am well aware of their partnership with a local company called XyBase in an announcement in June last year. Xybase, through its vehicle called Anise Asia, entered into the partnership to resell Joyent’s SmartCenter solution. For those who has worked with XyBase in Malaysia, let’s not go there. 😉
Enough chitter-chatter! What’s Joyent about?
Well, for Malaysian IT followers, we are practically drowned in VMware. VMware does a seminar every 1.5 months or so, and they get invited to other vendors’ events ever so frequently as well. My buddy, Mr. Ong Kok Leong, who was an early employee in VMware Malaysia, has been elevated to superstardom, thanks to his presence in everything VMware. It’s a good thing and kudos to VMware to take advantage of their first-to-market, super gung-ho approach in the last 3 years or so. They have built a sizable lead in the local market and the competitors like Citrix Xen, Microsoft Hyper-V are being left in a dust. I believe only RedHat’s KVM is making a bit of a dent but they are primarily confined to their own RedHat space. Furthermore, most of VMware competitors do not have a strong portfolio and a complete software stack to challenge VMware and what they have been churning out.
Here’s my take … consider Joyent because I see Joyent having a very, very strong portfolio to give VMware a run for its money. Public listed VMware has deep pockets to continue their marketing blitz and because of where they are right now, they have gotten very pricey and complicated. And this blogger intends to level the playing field a bit by sharing more about Joyent and their solutions.
I see Joyent having 4 very strong technologies that differentiates them from others. These technologies (in no particular order) are:
node.js
ZFS
DTrace
KVM
These technologies have been proven in the field because Joyent has been deploying, stress testing them and improving on them in their own cloud offering called Joyent Cloud for the last few year. This is true “eating your own dogfood” and putting your money where your mouth is. This is a very important considering when building a Cloud Computing offering, especially in the public cloud space. You need something that is proven and Joyent Cloud is testimonial to Joyent’s technology.
So let’s start with a diagram of the Joyent Cloud Software Stack.
Key to the performance of Joyent Cloud is node.js.
node.js as quoted in its website is “Node.js is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.” The key to this is being event-driven and asynchronous and cloud solutions developed using node.js are able to go faster, scale bigger and respond better. The event-based model follows a programming approach in which the flow of the program is determined by events that occurred.
A simple analogy is when you (in Malaysia) is at McDonald’s. In the past, the McDonald’s staff will service and fulfill your order before they service the next customer and so on. That was the flow of the past. Some time last year, McDonalds’ decide that their front staff would take your order, sends you to a queue and then took the order of the next customer. The back-end support staff would then fulfill your order putting that burger and drink on your tray. That is why they are able to serve (take your money) faster and get more things done. This is what I understand about event-driven, when it is applied in a programming content.
node.js has been touted as the new “Ruby-on-Rails” and it is all about low-latency, and concurrency in applications, especially cloud applications. Here’s a video introducing node.js, by Joyent’s very own Ryan Dahl, the creator of node.js.
Besides performance, you would also need a strong and robust file system to ensure security, data integrity and protection of data as it scales. ZFS is a 128-bit, enterprise file system that was developed in Sun more than 10 years ago, and I am a big admirer of the ZFS technology. I have written about ZFS in the past, comparing it with NetApp’s Data ONTAP and also written about ZFS self-healing properties in dealing with Silent Data Corruption. In fact, my buddy (him being the more technical one) and I have been developing storage solutions with ZFS.
Cloud Computing is complex and you have to know what’s happening in the Cloud. For the Cloud Service Provider, they must know the real-time behaviour of the cloud properties. It could be for performance, resource consumption and contention, bottlenecks, applications characteristics, and even for finding the problems as quickly as possible. For the customers, they must have the ability to monitor, understand and report what they are consuming and using in the Cloud.
The regular used buzzword is Analytics and DTrace is the framework developed for Cloud Analytics. When it comes to analytics, nothing comes close to what DTrace can do. Most vendors (including VMware) will provide APIs for 3rd party ISVs to develop cloud analytics but nothing beats having the creator of the cloud technology given you the tools that they use internally. That is what Joyent is giving to the customer, DTrace, a tool that they use themselves internally. Here’s a screenshot of DTrace in action for Joyent’s SmartDataCenter.
I have always said that you got to see it to know it. Cloud visibility is crucial for the optimal operational efficiency of the cloud.
Joyent already has Solaris Zones technology in its offering. But the missing piece was bare metal hypervisor and last year, Joyent added the final piece. KVM (Kernel-based Virtualization) was ported to Joyent, and KVM is more secure, and faster than the traditional approach of VMware, which relies on binary translation. KVM would mean that the virtualization kernel has direct interaction and communication with the native x86 virtualization on processors that supports hardware virtualization extension. There is a whole religious debate about native, paravirtualization and binary translation on the web. You can read one here, and as I said, KVM is native virtualization.
There are lots more to know about Joyent but you got to spend some time to learn about it. It is not well known (yet) in this part of the world, my intention in this blog entry is to disseminate information so that you readers don’t have to be droned into one thing only.
There are choices and in the virtualization space, it is just not always about VMware. VMware deserves to be where they are but when one comes into power (like VMware), he/she tends to become less friendly to work it. A customer should not be subjected to this new order of oppression because businesses are there when there are customers. And as customers, they are always choices and Joyent is one good choice.
I am a bit surprised that primary storage deduplication has not taken off in a big way, unlike the times when the buzz of deduplication first came into being about 4 years ago.
When the first deduplication solutions first came out, it was particularly aimed at the backup data space. It is now more popularly known as secondary data deduplication, the technology has reduced the inefficiencies of backup and helped sparked the frenzy of adulation of companies like Data Domain, Exagrid, Sepaton and Quantum a few years ago. The software vendors were not left out either. Symantec, Commvault, and everyone else in town had data deduplication for backup and archiving.
It was no surprise that EMC battled NetApp and finally won the rights to acquire Data Domain for USD$2.4 billion in 2009. Today, in my opinion, the landscape of secondary data deduplication has pretty much settled and matured. Practically everyone has some sort of secondary data deduplication technology or solution in place.
But then the talk of primary data deduplication hardly cause a ripple when compared a few years ago, especially here in Malaysia. Yeah, the IT crowd is pretty fickle that way because most tend to follow the trend of the moment. Last year was Cloud Computing and now the big buzz word is Big Data.
We are here to look at technologies to solve problems, folks, and primary data deduplication technology solutions should be considered in any IT planning. And it is our job as storage networking professionals to continue to advise customers about what is relevant to their business and addressing their pain points.
I get a bit cheesed off that companies like EMC, or HDS continue to spend their marketing dollars on hyping the trends of the moment rather than using some of their funds to promote good technologies such as primary data deduplication that solve real life problems. The same goes for most IT magazines, publications and other communications mediums, rarely giving space to technologies that solves problems on the ground, and just harping on hypes, fuzz and buzz. It gets a bit too ordinary (and mundane) when they are trying too hard to be extraordinary because everyone is basically talking about the same freaking thing at the same time, over and over again. (Hmmm … I think I am speaking off topic now .. I better shut up!)
We are facing an avalanche of data. The other day, the CEO of Nexenta used the word “data tsunami” but whatever terms used do not matter. There is too much data. Secondary data deduplication solved one part of the problem and now it’s time to talk about the other part, which is data in primary storage, hence primary data deduplication.
What is out there? Who’s doing what in term of primary data deduplication?
NetApp has their A-SIS (now NetApp Dedupe) for years and they are good in my books. They talk to customers about the benefits of deduplication on their FAS filers. (Side note: I am seeing more benefits of using data compression in primary storage but I am not going to there in this entry). EMC has primary data deduplication in their Celerra years ago but they hardly talk much about it. It’s on their VNX as well but again, nobody in EMC ever speak about their primary deduplication feature.
I have always loved Ocarina Networks ECO technology and Dell don’t give much hoot about Ocarina since the acquisition in 2010. The technology surfaced a few months ago in Dell DX6000G Storage Compression Node for its Object Storage Platform, but then again, all Dell talks about is their Fluid Data Architecture from the Compellent division. Hey Dell, you guys are so one-dimensional! Ocarina is a wonderful gem in their jewel case, and yet all their storage guys talk about are Compellent and EqualLogic.
Moving on … I ought to knock Oracle on the head too. ZFS has great data deduplication technology that is meant for primary data and a couple of years back, Greenbytes took that and made a solution out of it. I don’t follow what Greenbytes is doing nowadays but I do hope that the big wave of primary data deduplication will rise for companies such as Greenbytes to take off in a big way. No thanks to Oracle for ignoring another gem in ZFS and wasting their resources on pre-sales (in Malaysia) and partners (in Malaysia) that hardly know much about the immense power of ZFS.
But an unexpected source coming from Microsoft could help trigger greater interest in primary data deduplication. I have just read that the next version of Windows Server OS will have primary data deduplication integrated into NTFS. The feature will be available in Windows 8 and the architectural view is shown below:
The primary data deduplication in NTFS will be a feature add-on for Windows Server users. It is implemented as a filter driver on a per volume basis, with each volume a complete, self describing unit. It is cluster aware, and fully crash consistent on all operations.
The technology is Microsoft’s own technology, built from scratch and will be working to position Hyper-V as an strong enterprise choice in its battle for the server virtualization space with VMware. Mind you, VMware already has a big, big lead and this is just something that Microsoft must do-or-die to keep Hyper-V playing catch-up. Otherwise, the gap between Microsoft and VMware in the server virtualization space will be even greater.
I don’t have the full details of this but I read that the NTFS primary deduplication chunk sizes will be between 32KB to 128KB and it will be post-processing.
With Microsoft introducing their technology soon, I hope primary data deduplication will get some deserving accolades because I think most companies are really not doing justice to the great technologies that they have in their jewel cases. And I hope Microsoft, with all its marketing savviness and adeptness, will do some justice to a technology that solves real life’s data problems.
I bid you good luck – Primary Data Deduplication! You deserved better.
Happy New Year! I am looking forward to the year of 2012.
Lately, I have been involved in Cloud Computing forums and I have been reading articles on Cloud Computing. I even took up a 5-day course on Cloud Computing in order to prepare myself for the inevitable. Yes, Cloud Computing is here to stay, but we joke about it, don’t we? I think the fun word of Cloud Computing is “cloudy“, which is indeed very true.
As I ingest more and more information about Cloud Computing, the definition of how different people has different perspective or opinion about Cloud Computing has never been “cloudier“. It is fuzzy, hazy, and confusing. And in the forums, many were saying that virtualization is Cloud Computing. What do you think?
I found that one definition of Cloud Computing very definitive, yet simple. This definition comes from the National Institute of Standards and Technology (NIST) of the US Department of Commerce. In its publication #800145, NIST defines Cloud Computing to have the following 5 essential characteristics (duplicated in verbatim):
On-demand self-service. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.
Broad network access. Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations).
Resource pooling. The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, and network bandwidth.
Rapid elasticity. Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.
Measured service. Cloud systems automatically control and optimize resource use by leveraging a metering capability1 at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.
The 5 essential characteristics are very important in determining whether Virtualization = Cloud Computing, and I know there are a lot of people out there that says that Virtualization equates Cloud Computing. Let’s see the table below:
Some readers might argue about the “YES” or “NO” in the above comparison, but I do not want to dwell on the matter. Yes, I believe that many of these things are doable in their own right but with different level of complexity and costs. The objective is to settle the arguments and confusions of Cloud Computing, accept some definitive terms and move on.
As you can see from the table above, Virtualization does not equate to Cloud Computing. We can say that Virtualization enables Cloud Computing to happen. It is the pre-cursor to Cloud Computing.
In Cloud Computing, there are different Service Models. NIST defines 3 different Service Models. They are:
Software as a Service (SaaS). The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure2. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
Platform as a Service (PaaS). The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider.3 The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.
Infrastructure as a Service (IaaS). The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).
And NIST went on to define the Deployment Models of Cloud Computing as listed below:
Private cloud. The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.
Community cloud. The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises.
Public cloud. The cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider.
Hybrid cloud. The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).
There! Cloud Computing by the definition of NIST. It is simple, easily understood and most importantly, it give us the context of what we are looking for in the sea of confusion. Here’s the link to NIST’s PDF.
We can argue till the cows come home but it is best to stick to a simple definition of Cloud Computing and focus on other more important aspects of the cloud.
I hope to share more of my Cloud Computing experience with you and storage will have a big part to play in it.
I was in Singapore last week attending the Cloud Infrastructure Services course.
In the class, one of the foundation components of Cloud Computing is of course, storage. As the students and the instructor talked about Storage, one very interesting argument surfaced. It revolved around the storage, if it was offered on the cloud. A lot of people assumed that Cloud Storage would be for their databases, and their virtual machines, which of course, is true when the communication between the applications, virtual machines and databases are in the local area network of the Cloud Service Provider (CSP).
However, if the storage is offered through the cloud to applications that are sitting on-premise in the customer’s server room, then we have to think twice of how we perceive Cloud Storage. In this aspect, the Cloud Storage offered by the CSP is a Infrastructure-as-a-Service (IaaS), where the key service is Storage. We have to differentiate that this Storage functions as a data container, and usually not for I/O performance reasons.
Though this concept probably will be easily understood by storage professionals like us, this can cause a bit confusion for someone new to the concept of Cloud Computing and Cloud Storage. This confusion, unfortunately, is caused by many of us who are vendors or solution providers, or even publications and magazines. We are responsible to disseminate correct information to customers, but due to our lack of knowledge and experience in this extremely new market of Cloud Storage, we have created the FUDs (Fear, Uncertainty and Doubt) and hype.
Therefore, it is the duty of this blogger to clear the vapourware, and hopefully pass on the right information to accelerate the adoption of Cloud Storage in the near future. At this moment, given the various factors such as network costs, high network latency and lack of key network technologies similar to LAN in Cloud Computing, Cloud Storage is, most of the time, for data storage containership and archiving only. And there are no IOPS or any performance related statistics related to Cloud Storage. If any engineer or vendor tells you that they have the fastest Cloud Storage in the industry, do me a favour. Give him/her a knock on the head for me!
Of course, as technologies evolve, this could change in the near future. For now, Cloud Storage is a container, NOT a high performance storage in the cloud. It is usually not meant for transactional data. There are many vendors in the Cloud Storage space from real CSPs to storage companies offering re-packaged storage boxes that are “cloud-ready”. A good example of a CSP offering Cloud Storage is Amazon S3 (Simple Storage Service). And storage vendors such as EMC and HDS are repackaging and rebranding their storage technologies as object storage, ready for the cloud. EMC Atmos is really a repackaged and rebranded Centera, with some slight modifications, while HDS , using their Archiving solution, has HCP (aka HCAP). There’s nothing wrong with what EMC and HDS have done, but before the overhyping of the world of Cloud Computing, these platforms were meant for immutable data archiving reasons. Just thought you should know.
One particular company that captured my imagination and addresses the storage performance portion is Nasuni. Of course, they are quite inventive with the Cloud Storage Gateway approach. Nasuni comes up with a Cloud Storage Gateway filer appliance, which can be either a physical 1U server or as a VMware or Hyper-V virtual appliance sitting on-premise at the customer’s site.
The key to this is “on-premise”, which allows access to data much faster because they are locally-cached in the Nasuni filer appliance itself. This Nasuni filer piece addresses the Cloud Storage “performance” piece but Nasuni do not claim any performance statistics with such implementation. The clever bit is that this addresses data or files that are transactional in nature, i.e. NFS or CIFS, to serve data or files “locally”. (I wonder if Nasuni filer has iSCSI as well. Hmmmm….)
In the Nasuni architecture, they “break up” their “Cloud Storage” into 2 pieces. Piece #1 sits on-premise, at the customer site, and acts as a bridge to the Piece #2, that is sitting in a Cloud Storage. From a simplified view, have a look at the diagram below:
Piece #1 is the component that handles some of the transactional traffic related to files. In a more technical diagram below, you can see that the Nasuni filer addresses the file sharing portion, using the local disks on the filer appliance as a local caching mechanism.
Furthermore, older file pieces are whiffed away to the any Cloud Storage using the Cloud Connector interface, hence giving the customer a sense that their storage capacity needs can be limitless if they want to (for a fee, of course). At the same time, the Nasuni filer support thin provisioning and snapshots. How cool is that!
The Cloud Storage piece (Piece #2) is used for the data container and archiving reasons. This component can be sitting and hosted at Amazon S3, Microsoft Azure, Rackspace Cloud Files, Nirvanix Storage Delivery Network and Iron Mountain Archive Services Platform.
The data communication and transfer between the Nasuni filer is secure, encrypted, deduplication and compressed, giving it the efficiency and security that most customers would be concerned about. The diagram below explains the dat communication and data transfer bit.
In this manner, the Nasuni filer can replace traditional NAS platforms and can potentially provide a much lower total cost of ownership (TCO) in the long run. Nasuni does not pretend to be a NAS replacement. To me, this concept is very inventive and could potentially change the way we perceive file sharing and file server, obscuring and blurring concept of NAS.
Again, I would like to reiterate that Nasuni does not attempt to say their solution is a NAS or a performance-based Cloud Storage but what they have cleverly packaged seems to be appealing to customers. Their customer base has grown 78% in Q2 of 2011. It’s just too bad they are not here in Malaysia or this part of the world (yet).