It has not caused severe pain yet but it will. Storage is cheap but as capacity grows, it will eventually hit a limit that makes storage difficult to maintain from a cost perspective.
I wrote about the lack of attention of primary storage deduplication solutions in the local industry. Perhaps deduplication has matured to a point that it has become a no-brainer or perhaps customers are already getting sick and tired of the word “dedupe”. Either way, we should not be distracted from the fact that data footprint reduction (DFR) in a generic sense or storage efficiency as a fancy marketing term, must be applied somewhere to slow down the purchase of storage capacity.
Storage is getting fatter, and storage vendors’ revenue is getting fatter along with it. While this is good for the pockets of vendors, the customers have to face higher costs associated with
- Power, Cooling and Floorspace
- Administration and management
- Bandwidth
- Resource utilization
All these are not prudent storage management practices, because fat storage is bad, just like human beings getting fatter. Similarly, storage must go on a diet and deduplication is one of the few solutions out there. However, I have spoken out that deduplication is just shrinking the container that holds the bits of data, completely unaware of what the content is. Deduplication does not shrink the data itself, and if the occurrence of the data is high, deduplication does not help in reducing the storage capacity. There is no advantage unless the data footprint reduction (DFR) technology is content aware. (Note that I am using DFS as a generic term rather than data deduplication. The reason is obvious.)
That is why data deduplication technologies does not work well with seismic files or encoded video files, because the files are already highly optimized. But there is a technology that can look deeper into such unstructured files and produce storage capacity reduction with specific algorithms for specific type of files and file objects. That technology, I believe, is the truest form of data footprint reduction and it is called Native Format Optimization (NFO).
I want to relate an old story I had experienced when I brought an EMC BURA (Back Up Recovery & Archive – a precursor to its present BRS division) senior manager to see a highly respected technical manager in Schlumberger in Malaysia a few years back. Schlumberger is the world’s largest oilfield services company and provides seismic analysis and interpretation software and seismic files are highly encoded and compressed.
As usual, the senior manager being a typical sales guy started blabbering how great Data Domain (this was just after the EMC acquisition) was, and how it can dedupe any kind of files giving 20:1 (exaggerated to 500:1 to certain text files), even for seismic files. I was signalling to the EMC senior manager to stop his bullsh*t, but he went on and on. In the end, the Schlumberger technical manager politely told the EMC senior manager to shut up, because he has little understand of what seismic files are like.
Now, back to Native Format Optimization (NFO) technology. In a nutshell, NFO plays trick with our human visual system. The goal is to reduce the size of unstructured files without reducing the visual quality of the images (text, texture, colour, resolution, depth, hue, contrast, etc) of the files.
Have a look at these 2 files. One is optimized with NFO and one is un-optimized. Can you tell the difference?
The human visual system is known to be:
- Less sensitive to high frequency of colour variation
- More sensitive to brightness than colour variation
- Less sensitive to background colour in lower resolution
- More sensitive to a picture’s motion than picture’s texture
Therefore, the eyes perceive an image based on mostly the lowest quality baseline. I got this information from George Crump’s Storage Switzerland’s article.
Because NFO is already in its native form, the files does not need to be rehydrated like deduped files.
The capacity reduction savings is tremendous and because NFO approach is content aware, the benefits translates to higher cost savings in
- Reduction of power, cooling and floorspace
- Reduction in data management and administration tasks, especially backup
- Improved bandwidth and improved disaster recovery
- Higher performance
- Delayed storage capacity purchase
- many more
After Ocarina acquisition by Dell in 2010, a search on the web revealed that probably only one vendor in Europe has boldly continued to enhance NFO technology in their products. The company is balesio and you can read about their NFO technology here.
Pingback: NFO for DFR « Storage Gaga