Filesystems – Page 12

Don’t get too drunk on Hyper Converged

By cfheoh | August 16, 2015 - 9:19 am |August 16, 2015 Acquisition, Appliance, Cisco, Data Archiving, Data Availability, Data Management, Deduplication, Disks, EMC, Filesystems, Gartner, High Performance Computing, Hyperconvergence, NetApp, Nutanix, Performance Benchmark, Reliability, Scale-out architecture, Simplivity, Software Defined Storage, Software-defined Datacenter, Solid State Devices, Storage Market Share, Virtualization

Oops, excuse me but your silo is showing

By cfheoh | April 8, 2015 - 8:07 am |April 10, 2015 Appliance, Backup, Cloud, Data, Data Archiving, Data Availability, Data Management, Filesystems, High Performance Computing, Hyperconvergence, Jon Toigo, NAS, NetApp, Nutanix, Object Storage, Performance Benchmark, Reliability, Scale-out architecture, Security, Simplivity, SNIA, Software Defined Storage, Software-defined Datacenter, Solid State Devices, Storage Tiering, Unified Storage, Virtualization, VMware

2 Comments

It is the morning that the SNIA Global Steering Committee reporting session is starting soon. I am in the office extremely early waiting for my turn to share the happenings in SNIA Malaysia.

And of late, I have been getting a lot of calls to catch up on hot technologies, notably All Flash Storage arrays and hyper-converged infrastructure. Even though I am now working for Interica, a company that focuses on Oil & Gas exploration and production software, my free coffee sessions with folks from the IT side have not diminished. And I recalled a week back in mid-March where I had coffee overdose!

Flash storage and hyperconvergence are HOT! Despite the hypes and frenzies of both flash storage and hyperconvergence, I still believe that integrating either or, or both, still have an effect that many IT managers overlook. The effect is a data silo.

Continue reading →

The reverse wars – DAS vs NAS vs SAN

By cfheoh | March 13, 2015 - 9:54 am |March 13, 2015 10Gigabit Ethernet, Appliance, ATA over Ethernet, Avere, CIFS, Cloud, Data, Disks, EMC, Fibre Channel, Filesystems, HDS, High Performance Computing, Hyperconvergence, iSCSI, Linux, Memory Cloud, Microsoft, NAS, NetApp, Nexenta, NFS, Nutanix, NVMe, Object Storage, Open Compute Project, Openstack, Panasas, PCIe, Performance Benchmark, Performance Caching, Scale-out architecture, SCSI, Seagate, Server SAN, Simplivity, SMB, Software Defined Storage, Solaris, Solid State Devices, Storage Optimization, Storage Tiering, Unified Storage, Virident, Virsto, Virtualization, VMware, XtremIO

4 Comments

It has been quite an interesting 2 decades.

In the beginning (starting in the early to mid-90s), SAN (Storage Area Network) was the dominant architecture. DAS (Direct Attached Storage) was on the wane as the channel-like throughput of Fibre Channel protocol coupled by the million-device addressing of FC obliterated parallel SCSI, which was only able to handle 16 devices and throughput up to 80 (later on 160 and 320) MB/sec.

NAS, defined by CIFS/SMB and NFS protocols – was happily chugging along the 100 Mbit/sec network, and occasionally getting sucked into the arguments about why SAN was better than NAS. I was already heavily dipped into NFS, because I was pretty much a SunOS/Solaris bigot back then.

When I joined NetApp in Malaysia in 2000, that NAS-SAN wars were going on, waiting for me. NetApp (or Network Appliance as it was known then) was trying to grow beyond its dot-com roots, into the enterprise space and guys like EMC and HDS were frequently trying to put NetApp down.

“It’s a toy…” was the most common jibe I got in regular engagements until EMC suddenly decided to attack Network Appliance directly with their EMC CLARiiON IP4700. EMC guys would fondly remember this as the “NetApp killer“. Continue reading →

Why demote archived data access?

By cfheoh | March 10, 2015 - 1:21 pm |March 10, 2015 10Gigabit Ethernet, Appliance, Avere, Big Data, CIFS, Cloud, Data, Data Archiving, Data Availability, Data Management, Disks, EMC, Filesystems, HDS, High Performance Computing, NAS, NetApp, Nexenta, NFS, Performance Benchmark, Performance Caching, Reliability, ROBO, SATA, Scale-out architecture, SMB, Solid State Devices, Storage Optimization, Storage Tiering

1 Comment

We are all familiar with the concept of data archiving. Passive data gets archived from production storage and are migrated to a slower and often, cheaper storage medium such tapes or SATA disks. Hence the terms nearline and offline data are created. With that, IT constantly reminds users that the archived data is infrequently accessed, and therefore, they have to accept the slower access to passive, archived data.

The business conditions have certainly changed, because the need for data to be 100% online is becoming more relevant. The new competitive nature of businesses dictates that data must be at the fingertips, because speed and agility are the new competitive advantage. Often the total amount of data, production and archived data, is into hundred of TBs, even into PetaBytes!

The industries I am familiar with – Oil & Gas, and Media & Entertainment – are facing this situation. These industries have a deluge of files, and unstructured data in its archive, and much of it dormant, inactive and sitting on old tapes of a bygone era. Yet, these files and unstructured data have the most potential to be explored, mined and analyzed to realize its value to the organization. In short, the archived data and files must be democratized!

The flip side is, when the archived files and unstructured data are coupled with a slow access interface or unreliable storage infrastructure, the value of archived data is downgraded because of the aggravated interaction between access and applications and business requirements. How would organizations value archived data more if the access path to the archived data is so damn hard???!!!

An interesting solution fell upon my lap some months ago, and putting A and B together (A + B), I believe the access path to archived data can be unbelievably of high performance, simple, transparent and most importantly, remove the BLOODY PAIN of FILE AND DATA MIGRATION! For storage administrators and engineers familiar with data migration, especially if the size of the migration is into hundreds of TBs or even PBs, you know what I mean!

I have known this solution for some time now, because I have been avidly following its development after its founders left NetApp following their Spinnaker venture to start Avere Systems.

Continue reading →

Hail Hydra!

By cfheoh | November 20, 2014 - 3:40 pm |November 20, 2014 Actifio, Appliance, Backup, Big Data, CIFS, Data Management, Deduplication, EMC, Filesystems, HP, NAS, NFS, Object Storage, RAID, Reliability, Scale-out architecture, Virtualization

2 Comments

The last of the Storage Field Day 6 on November 7th took me and the other delegates to NEC. There was an obvious, yet eerie silence among everyone about this visit. NEC? Are you kidding me?

NEC isn’t exactly THE exciting storage company in the Silicon Valley, yet I was pleasantly surprised with their HydraStorprowess. It is indeed quite a beast, with published numbers of backup throughput of 4PB/hour, and scales to 100PB of capacity. Most impressive indeed, and HydraStor deserves this blogger’s honourable architectural dissection.

HydraStor is NEC’s grid-based, scale-out storage platform with an object storage backend. The technology, powered by the DynamicStor ™ software, a distributed file system laid over the HydraStor grid architecture. At the same time, it has the DataRedux™ technology that provides the global in-line deduplication as the HydraStor ingests data for data protection, replication, archiving and WORM purposes. It is a massive data consolidation platform, storing gazillion loads of data (100PB you say?) for short-term and long-term retention and recovery.

The architecture is indeed solid, and its data availability goes beyond traditional RAID-level resiliency. HydraStor employs their proprietary erasure coding, called Distributed Resilient Data™. The resiliency knob can be configured to withstand 6 concurrent disks or nodes failure, but by default configured with a resiliency level of 3.

We can quickly deduce that DynamicStor™, DataRedux™ and Distributed Resilient Data™ are the technology pillars of HydraStor. How do they work, and how do they work together?

Let’s look a bit deeper into the HydraStor architecture.

HydraStor is made up of 2 types of nodes:

Accelerator Nodes
Storage Nodes

The Accelerator Nodes (AN) are the access nodes. They interface with the HydraStor front end, which could be CIFS, NFS or OST (Open Storage Technology). The AN nodes chunks the in-coming data and performs in-line deduplication at a very high speed. It can reach speed of 300TB/hour, which is blazingly fast!

The AN nodes also runs DynamicStor™, handling the performance heavy-lifting portion of HydraStor. The chunked data from the AN nodes are then passed on to the Storage Nodes (SN), where they are further “deduped in-line” to determined if the chunks are unique or not. It is a two-step inline deduplication process. Below is a diagram showing the ANs built above the SNs in the HydraStor grid architecture.

The HydraStor grid architecture is also a very scalable architecture, allow the dynamic scale-in and scale-out of both ANs and SNs. AN nodes and SN nodes can be added or removed into the system, auto-configuring and auto-optimizing while everything stays online. This capability further strengthens the reliability and the resiliency of the HydraStor.

Moving on to DataRedux™. DataRedux™ is HydraStor’s global in-line data deduplication technology. It performs dedupe at the sub-file level, with variable length window. This is performed at the AN nodes and the SN nodes level,chunking and creating unique hash values. All unique chunks are further compressed with a modified LZ compression algorithm, shrinking the data to its optimized footprint on the disk storage. To maintain the global in-line deduplication, the hash table is available across the HydraStor cluster.

The unique data chunk resulting from deduplication and compression are then written to disks using the configured Distributed Resilient Data™ (DRD) algorithm, at its set resiliency level.

At the junction of DRD, with erasure coding parity, the data is broken up into multiples of fragments and assigned a parity to a grouping of fragments. If the resiliency level is set to 3 (the default), the data is broken into 12 pieces, 9 data fragments + 3 parity fragments. The 3 parity fragments corresponds to the resiliency level of 3. See diagram below of the 12 fragments spread across a group of selected disks in the storage pool of the Storage Nodes.

If the HydraStor experiences a failure in the disks or nodes, and has resulted in the loss of a fragment or fragments, the DRD self-healing function will auto-rebuild and auto-reconfigure the recovered fragments in another set of disks, maintaining the level of 3 parities.

The resiliency level, as mentioned earlier, can be set up to 6, boosting the HydraStor survival factor of 6 disks or nodes failure in the grid. See below of how the autonomous DRD recovery works:

Despite lacking the razzle dazzle of most Silicon Valley storage startups and upstarts, credit be given where credit is due. NEC HydraStor is indeed a strong show stopper.

However, in a market that is as fickle as storage, deduplication solutions such as HydraStor, EMC Data Domain, and HP StoreOnce, are being superceded by Copy Data Management technology, touted by Actifio. It was rumoured that EMC restructured their entire BURA (Backup Recovery Archive) division to DPAD (Data Protection and Availability Division) to go after the burgeoning copy data management market.

It would be good if NEC can take notice and turn their HydraStor “supertanker” towards the Copy Data Management market. That would be something special to savour.

P/S: NEC. Sorry about the title. I just couldn’t resist it 😉

MASSive, Impressive, Agile, TEGILE

By cfheoh | November 20, 2014 - 3:03 pm |November 20, 2014 Analytics, Appliance, CIFS, Cloud, Data, Deduplication, Fibre Channel, Filesystems, iSCSI, NetApp, NFS, NVMe, PCIe, Performance Benchmark, Performance Caching, RAID, Scale-out architecture, SMB, Snapshots, Software Defined Storage, Storage Optimization, Tegile, Unified Storage, Virtualization, VMware

1 Comment

Ah, my first blog after Storage Field Day 6!

It was a fantastic week and I only got to fathom the sensations and effects of the trip after my return from San Jose, California last week. Many thanks to Stephen Foskett (@sfoskett), Tom Hollingsworth (@networkingnerd) and Claire Chaplais (@cchaplais) of Gestalt IT for inviting me over for that wonderful trip 2 weeks’ ago. Tegile was one of the companies I had the privilege to visit and savour.

In a world of utterly confusing messaging about Flash Storage, I was eager to find out what makes Tegile tick at the Storage Field Day session. Yes, I loved Tegile and the campus visit was very nice. I was also very impressed that they have more than 700 customers and over a thousand systems shipped, all within 2 years since they came out of stealth in 2012. However, I was more interested in the essence of Tegile and what makes them stand out.

I have been a long time admirer of ZFS (Zettabyte File System). I have been a practitioner myself and I also studied the file system architecture and data structure some years back, when NetApp and Sun were involved in a lawsuit. A lot of have changed since then and I am very pleased to see Tegile doing great things with ZFS.

Tegile’s architecture is called IntelliFlash. Here’s a look at the overview of the IntelliFlash architecture:

So, what stands out for Tegile? I deduce that there are 3 important technology components that defines Tegile IntelliFlash ™ Operating System.

MASS (Metadata Accelerator Storage System)
Media Management
Inline Compression and Inline Deduplication

What is MASS? Tegile has patented MASS as an architecture that allows optimized data path to the file system metadata.

Often a typical file system metadata are stored together with the data. This results in a less optimized data access because both the data and metadata are given the same priority. However, Tegile’s MASS writes and stores the filesystem metadata in very high speed, low latency DRAM and Flash SSD. The filesystem metadata probably includes some very fine grained and intimate details about the mapping of blocks and pages to the respective capacity Flash SSDs and the mechanical HDDs. (Note: I made an educated guess here and I would be happy if someone corrected me)

Going a bit deeper, the DRAM in the Tegile hybrid storage array is used as a L1 Read Cache, while Flash SSDs are used as a L2 Read and Write Cache. Tegile takes further consideration that the Flash SSDs used for this caching purpose are different from the denser and higher capacity Flash SSDs used for storing data. These Flash SSDs for caching are obviously the faster, lower latency type of eMLCs and in the future, might be replaced by PCIe Flash optimized by NVMe.

This approach gives absolute priority, and near-instant access to the filesystem’s metadata, making the Tegile data access incredibly fast and efficient.

Tegile’s Media Management capabilities excite me. This is because it treats every single Flash SSD in the storage array with very precise organization of 3 types of data patterns.

Write caching, which is high I/O is focused on a small segment of the drive
Metadata caching, which has both Read and Write I/O is targeted to a slight larger segment of the drive
Data is laid out on the rest of the capacity of the drive

Drilling deeper, the write caching (in item 1 above) high I/O writes are targeted at the drive segment’s range which is over-provisioned for greater efficiency and care. At the same time, the garbage collection(GC) of this segment is handled by the respective drive’s controller. This is important because the controller will be performing the GC function without inducing unnecessary latency to the storage array processing cycles, giving further boost to Tegile’s already awesome prowess.

In addition to that, IntelliFlash ™ aligns every block and every page exactly to each segment and each page boundary of the drives. This reduces block and page segmentation, and thereby reduces issues with file locality and free blocks locality. It also automatically adjust its block and page alignments to different drive types and models. Therefore, I believe, it would know how to align itself to a 512-bytes or a 520-bytes sector drives.

The Media Management function also has advanced cell care. The wear-leveling takes on a newer level of advancement where how the efficient organization of blocks and pages to the drives reduces additional and often unnecessary erase and rewrites. Furthermore, the use of Inline Compression and Inline Deduplication also reduces the number of writes to drives media, increasing their longevity.

Compression and deduplication are 2 very important technology features in almost all flash arrays. Likewise, these 2 technologies are crucial in the performance of Tegile storage systems. They are both inline i.e – Inline Compression and Inline Deduplication, and therefore both are boosted by the multi-core CPUs as well as the fast DRAM memory.

I don’t have the secret sauce formula of how Tegile designed their inline compression and deduplication. But there’s a very good article of how Tegile viewed their method of data reduction for compression and deduplication. Check out their blog here.

The metadata of data access of each and every customer is probably feeding into their Intellicare, a cloud-based customer care program. Intellicare is another a strong differentiator in Tegile’s offering.

Oh, did I mentioned they are unified storage as well with both SAN and NAS, including SMB 3.0 support?

I left Tegile that afternoon on November 5th feeling happy. I was pleased to catch up with Narayan Venkat, my old friend from NetApp, who is now their Chief Marketing Officer. I was equally pleased to see Tegile advancing ZFS further than the others I have known. With so much technological advancement and more coming, the world is their oyster.

How valuable is your data anywhere?

By cfheoh | October 22, 2014 - 8:32 pm |October 22, 2014 Data, Data Availability, Data Management, Filesystems, Riverbed, Security, Unified Storage

Praying to the hypervisor God

By cfheoh | October 5, 2014 - 2:25 pm |October 5, 2014 10Gigabit Ethernet, API, Appliance, Cisco, Cloud, Datacore, Deduplication, Dell, Disks, EMC, Filesystems, Gartner, HDS, HP, IBM, Microsoft, NetApp, Nexenta, Nutanix, NVMe, Open Compute Project, Openstack, Oracle, PCIe, Scale-out architecture, ScaleMP, Server SAN, Simplivity, SNIA, Software Defined Storage, Software-defined Datacenter, Solaris, Solid State Devices, Storage Market Share, Storage Optimization, Storage Tiering, Tintri, Violin Memory, Virident, Virsto, Virtualization, VMware

1 Comment

I was reading a great article by Frank Denneman about storage intelligence moving up the stack. It was pretty much in line with what I have been observing in the past 18 months or so, about the storage pendulum having swung back to DAS (direct attached storage). To be more precise, the DAS form factor I am referring to are physical server hardware that houses many disk drives.

Like it or not, the hypervisor has become the center of the universe in the IT space. VMware has become the indomitable force in the hypervisor technology, with Microsoft Hyper-V playing catch-up. The seismic shift of these 2 hypervisor technologies are leading storage vendors to place them on to the altar and revering them as deities. The others, with the likes of Xen and KVM, and to lesser extent Solaris Containers aren’t really worth mentioning.

This shift, as the pendulum swings from networked storage back to internal “direct-attached” storage are dictated by 4 main technology factors:

The x86 server architecture
Software-defined
Scale-out architecture
Flash-based storage technology

Anyone remember Thumper? Not the Disney character from the Bambi movie!

When the SunFire X4500 (aka Thumper) was first released in (intermission: checking Wiki for the right year) in 2006, I felt that significant wound inflicted in the networked storage industry. Instead of the usual 4-8 hard disk drives in the all the industry servers at the time, the X4500 4U chassis housed 48 hard disk drives. The design and architecture were so astounding to me, I even went and bought a 1U SunFire X4150 for my personal server collection. Such was my adoration for Sun’s technology at the time.

Continue reading →

Technology prowess of Riverbed SteelFusion

By cfheoh | September 20, 2014 - 7:20 am |September 22, 2014 Appliance, Backup, Data, Data Availability, Data Corruption, Deduplication, EMC, Fibre Channel, Filesystems, Hyperconvergence, iSCSI, NetApp, Performance Caching, Reliability, Riverbed, ROBO, Snapshots, Software-defined Datacenter, Storage Optimization, Storage Tiering, Unified Storage, Virtualization, VMware

3 Comments

The Riverbed SteelFusion (aka Granite) impressed me the moment it was introduced to me 2 years ago. I remembered that genius light bulb moment well, in December 2012 to be exact, and it had left its mark on me. Like I said last week in my previous blog, the SteelFusion technology is unique in the industry so far and has differentiated itself from its WAN optimization competitors.

To further understand the ability of Riverbed SteelFusion, a deeper inspection of the technology is essential. I am fortunate to be given the opportunity to learn more about SteelFusion’s technology and here I am, sharing what I have learned.

What does the technology of SteelFusion do?

Riverbed SteelFusion takes SAN volumes from supported storage vendors in the central datacenter and projects the storage volumes (aka LUNs)to applications and hosts at the remote branches. The technology requires a paired relationship between SteelFusion Core (in the centralized datacenter) and SteelFusion Edge (at the branch). Both SteelFusion Core and Edge are fronted respectively by the Riverbed SteelHead WAN optimization device, to deliver the performance required.

The diagram below gives an overview of how the entire SteelFusion network architecture is like:

Continue reading →

No Flash in the pan

By cfheoh | June 9, 2014 - 10:29 am |June 9, 2014 10Gigabit Ethernet, Cloud, Data, Data Availability, Datacore, Disks, Filesystems, Microsoft, NAS, NVMe, PCIe, Performance Benchmark, Performance Caching, Reliability, SATA, SCSI, Server SAN, Software Defined Storage, Solid State Devices, Storage Optimization, Unified Storage, VDI, Virident, Virsto, Virtualization, VMware

(picture courtesy of http://electronicdesign.com/memory/evolution-solid-state-storage-enterprise-servers)

Right at the top, we have the CPU/Memory complex (labelled as Processor). Our applications, albeit bytes and pieces of them, run in this CPU/Memory complex.

Therefore, we can see Pattern #1 showing up. Continue reading →

Category Archives: Filesystems

Oops, excuse me but your silo is showing

The reverse wars – DAS vs NAS vs SAN

Why demote archived data access?

Hail Hydra!

MASSive, Impressive, Agile, TEGILE

How valuable is your data anywhere?

Technology prowess of Riverbed SteelFusion

No Flash in the pan

(picture courtesy of http://electronicdesign.com/memory/evolution-solid-state-storage-enterprise-servers)

Recent Posts

Sponsored Ads

Google Adsense

Recent Comments

Google Adsense

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

(picture courtesy of http://electronicdesign.com/memory/evolution-solid-state-storage-enterprise-servers)

Share this:

Recent Posts

Sponsored Ads

Google Adsense

Recent Comments

Google Adsense