10Gigabit Ethernet Archives

How well do you know your data and the storage platform that processes the data

By cfheoh | December 20, 2021 - 7:30 am |December 20, 2021 100Gigabit Ethernet, 10Gigabit Ethernet, Algorithm, Analytics, Appliance, Backup, Big Data, Business Continuity, Cloud, Clusters, Composable Infrastructure, compression, Confluent, Data Archiving, Data Availability, Data Fabric, Data Management, Data Privacy, Data Protection, Data Security, Deduplication, Digital Transformation, Disaster Recovery, Edge Computing, Filesystems, Hyperconvergence, ILM, Industry 4.0, InfluxDB, iRODS, Machine Learning, NAS, NFS, NVMe, Object Storage, Performance Caching, Pravega, Reliability, SATA, Scale-out architecture, Security, Software Defined Storage, Storage Area Network, Storage Optimization, Storage Tiering, Unified Storage, VDI, Virtualization

I/O Characteristics

Applications and workloads (A&W) read and write from the data storage services platforms. These could be local DAS (direct access storage), network storage arrays in SAN and NAS, and now objects, or from cloud storage services. Regardless of structured or unstructured data, different A&Ws have different behavioural I/O patterns in accessing data from storage. Therefore storage has to be configured at best to match these patterns, so that it can perform optimally for these A&Ws. Without going into deep details, here are a few to think about:

Random and Sequential patterns
Block sizes of these A&Ws ranging from typically 4K to 1024K.
Causal effects of synchronous and asynchronous I/Os to and from the storage

Continue reading →

A Paean to NFS

By cfheoh | September 21, 2020 - 9:15 am |September 20, 2020 100Gigabit Ethernet, 10Gigabit Ethernet, API, Apple, Ceph, CIFS, Cloud, Cloudian, Clusters, Containers, DellEMC, Drivescale, Fibre Channel, Filesystems, Hedvig, High Performance Computing, Hyperconvergence, iSCSI, Isilon, Kubernetes, Linux, Microsoft, Microsoft Azure, NAS, NetApp, Nexenta, NFS, Panasas, RDMA, Redhat, Ryuusi, SMB, SNIA, VMware, WekaIO

The Network is Still the Computer

By cfheoh | October 22, 2018 - 11:35 am |October 22, 2018 100Gigabit Ethernet, 10Gigabit Ethernet, Analytics, API, Artificial Intelligence, Big Data, Cisco, Clusters, Data Management, Deep Learning, Disks, Drivescale, Fibre Channel, Filesystems, Hadoop, Hadoop Clusters, High Performance Computing, Infiniband, iSCSI, Linux, Machine Learning, MapReduce, NFS, NVMe, PCIe, Performance Benchmark, RDMA, Scale-out architecture, Storage Field Day, Tech Field Day, Virtualization

3 Comments

[Preamble: I have been invited by GestaltIT as a delegate to their TechFieldDay from Oct 17-19, 2018 in the Silicon Valley USA. My expenses, travel and accommodation are covered by GestaltIT, the organizer and I was not obligated to blog or promote their technologies presented at this event. The content of this blog is of my own opinions and views]

Sun Microsystems coined the phrase “The Network is the Computer“. It became one of the most powerful ideologies in the computing world, but over the years, many technology companies have tried to emulate and practise the mantra, but fell short.

I have never heard of Drivescale. It wasn’t in my radar until the legendary NFS guru, Brian Pawlowski joined them in April this year. Beepy, as he is known, was CTO of NetApp and later at Pure Storage, and held many technology leadership roles, including leading the development of NFSv3 and v4.

Prior to Tech Field Day 17, I was given some “homework”. Stephen Foskett, Chief Cat Herder (as he is known) of Tech Field Days and Storage Field Days, highly recommended Drivescale and asked the delegates to pick up some notes on their technology. Going through a couple of the videos, Drivescale’s message and philosophy resonated well with me. Perhaps it was their Sun Microsystems DNA? Many of the Drivescale team members were from Sun, and I was previously from Sun as well. I was drinking Sun’s Kool Aid by the bucket loads even before I graduated in 1991, and so what Drivescale preached made a lot of sense to me.Drivescale is all about Scale-Out Architecture at the webscale level, to address the massive scale of data processing. To understand deeper, we must think about “Data Locality” and “Data Mobility“. I frequently use these 2 “points of discussion” in my consulting practice in architecting and designing data center infrastructure. The gist of data locality is simple – the closer the data is to the processing, the cheaper/lightweight/efficient it gets. Moving data – the data mobility part – is expensive.

Continue reading →

The rise of RDMA

By cfheoh | May 8, 2017 - 2:42 pm |July 19, 2017 10Gigabit Ethernet, API, Brocade, CIFS, Data, Data Fabric, FCoE, Fibre Channel, Flash, High Performance Computing, Memory Cloud, NFS, NVMe, PCIe, Performance Benchmark, Performance Caching, RDMA, Scale-out architecture, SCSI, Server SAN, SMB, SNIA, Solid State Devices, Storage Optimization

3 Comments

I have known of RDMA (Remote Direct Memory Access) for quite some time, but never in depth. But since my contract work ended last week, and I have some time off to do some personal development, I decided to look deeper into RDMA. Why RDMA?

In the past 1 year or so, RDMA has been appearing in my radar very frequently, and rightly so. The speedy development and adoption of NVMe (Non-Volatile Memory Express) have pushed All Flash Arrays into the next level. This pushes the I/O and the throughput performance bottlenecks away from the NVMe storage medium into the legacy world of SCSI.

Most network storage interfaces and protocols like SAS, SATA, iSCSI, Fibre Channel today still carry SCSI loads and would have to translate between NVMe and SCSI. NVMe-to-SCSI bridges have to be present to facilitate the translation.

In the slide below, shared at the Flash Memory Summit, there were numerous red boxes which laid out the SCSI connections and interfaces where SCSI-to-NVMe translation (and vice versa) would be required.

Continue reading →

The engineering of Elastifile

By cfheoh | March 9, 2017 - 8:51 pm |March 9, 2017 10Gigabit Ethernet, Analytics, API, Big Data, CIFS, Cloud, Data Fabric, Data Management, Elastifile, Filesystems, Flash, Hyperconvergence, Linux, NAS, NFS, NVMe, Object Storage, Performance Caching, Reliability, Scale-out architecture, Software Defined Storage, Solid State Devices, Virtualization, XtremIO

2 Comments

[Preamble: I was a delegate of Storage Field Day 12. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented in this event]

When it comes to large scale storage capacity requirements with distributed cloud and on-premise capability, object storage is all the rage. Amazon Web Services started the object-based S3 storage service more than a decade ago, and the romance with object storage started.

Today, there are hundreds of object-based storage vendors out there, touting features after features of invincibility. But after researching and reading through many design and architecture papers, I found that many object-based storage technology vendors began to sound the same.

At the back of my mind, object storage is not easy when it comes to most applications integration. Yes, there is a new breed of cloud-based applications with RESTful CRUD API operations to access object storage, but most applications still rely on file systems to access storage for capacity, performance and protection.

These CRUD and CRUD-like APIs are the common semantics of interfacing object storage platforms. But many, many real-world applications do not have the object semantics to interface with storage. They are mostly designed to interface and interact with file systems, and secretly, I believe many application developers and users want a file system interface to storage. It does not matter if the storage is on-premise or in the cloud.

Let’s not kid ourselves. We are most natural when we work with files and folders.

Implementing object storage also denies us the ability to optimally utilize Flash and solid state storage on-premise when the compute is in the cloud. Similarly, when the compute is on-premise and the flash-based object storage is in the cloud, you get a mismatch of performance and availability requirements as well. In the end, there has to be a compromise.

Another “feature” of object storage is its poor ability to handle transactional data. Most of the object storage do not allow modification of data once the object has been created. Putting a NAS front (aka a NAS gateway) does not take away the fact that it is still object-based storage at the very core of the infrastructure, regardless if it is on-premise or in the cloud.

Resiliency, latency and scalability are the greatest challenges when we want to build a true globally distributed storage or data services platform. Object storage can be resilient and it can scale, but it has to compromise performance and latency to be so. And managing object storage will not be as natural as to managing a file system with folders and files.

Enter Elastifile.

Continue reading →

Considerations of Hadoop in the Enterprise

By cfheoh | September 9, 2016 - 10:10 pm |September 10, 2016 10Gigabit Ethernet, Data Management, Deduplication, Filesystems, Flash, Hadoop, Hadoop Clusters, High Performance Computing, MapReduce, NetApp, Performance Caching, RAID, Reliability, Server SAN, Software Defined Storage, Solid State Devices, Storage Optimization, Storage Tiering, Virtualization

1 Comment

I am guilty. I have not been tendering this blog for quite a while now, but it feels good to be back. What have I been doing? Since leaving NetApp 2 months or so ago, I have been active in the scenes again. This time I am more aligned towards data analytics and its burgeoning impact on the storage networking segment.

I was intrigued by an article posted by a friend of mine in Facebook. The article (circa 2013) was titled “Never, ever do this to Hadoop”. It described the author’s gripe with the SAN bigots. I have encountered storage professionals who throw in the SAN solution every time, because that was all they know. NAS, to them, was like that old relative smelled of camphor oil and they avoid NAS like a plague. Similar DAS was frowned upon but how things have changed. The pendulum has swung back to DAS and new market segments such as VSANs and Hyper Converged platforms have been dominating the scene in the past 2 years. I highlighted this in my blog, “Praying to the Hypervisor God” almost 2 years ago.

I agree with the author, Andrew C. Oliver. The “locality” of resources is central to Hadoop’s performance.

Consider these 2 models:

In the model on your left (Moving Data to Compute), the delivery process from Storage to Compute is HEAVY. That is because data has dependencies; data has gravity. However, if you consider the model on your right (Moving Compute to Data), delivering data processing to the storage layer is much lighter. Compute or data processing is transient, and the data in the compute layer is volatile. Once compute’s power is turned off, everything starts again from a clean slate, hence the volatile stage.

Continue reading →

The reverse wars – DAS vs NAS vs SAN

By cfheoh | March 13, 2015 - 9:54 am |March 13, 2015 10Gigabit Ethernet, Appliance, ATA over Ethernet, Avere, CIFS, Cloud, Data, Disks, EMC, Fibre Channel, Filesystems, HDS, High Performance Computing, Hyperconvergence, iSCSI, Linux, Memory Cloud, Microsoft, NAS, NetApp, Nexenta, NFS, Nutanix, NVMe, Object Storage, Open Compute Project, Openstack, Panasas, PCIe, Performance Benchmark, Performance Caching, Scale-out architecture, SCSI, Seagate, Server SAN, Simplivity, SMB, Software Defined Storage, Solaris, Solid State Devices, Storage Optimization, Storage Tiering, Unified Storage, Virident, Virsto, Virtualization, VMware, XtremIO

4 Comments

It has been quite an interesting 2 decades.

In the beginning (starting in the early to mid-90s), SAN (Storage Area Network) was the dominant architecture. DAS (Direct Attached Storage) was on the wane as the channel-like throughput of Fibre Channel protocol coupled by the million-device addressing of FC obliterated parallel SCSI, which was only able to handle 16 devices and throughput up to 80 (later on 160 and 320) MB/sec.

NAS, defined by CIFS/SMB and NFS protocols – was happily chugging along the 100 Mbit/sec network, and occasionally getting sucked into the arguments about why SAN was better than NAS. I was already heavily dipped into NFS, because I was pretty much a SunOS/Solaris bigot back then.

When I joined NetApp in Malaysia in 2000, that NAS-SAN wars were going on, waiting for me. NetApp (or Network Appliance as it was known then) was trying to grow beyond its dot-com roots, into the enterprise space and guys like EMC and HDS were frequently trying to put NetApp down.

“It’s a toy…” was the most common jibe I got in regular engagements until EMC suddenly decided to attack Network Appliance directly with their EMC CLARiiON IP4700. EMC guys would fondly remember this as the “NetApp killer“. Continue reading →

Why demote archived data access?

By cfheoh | March 10, 2015 - 1:21 pm |March 10, 2015 10Gigabit Ethernet, Appliance, Avere, Big Data, CIFS, Cloud, Data, Data Archiving, Data Availability, Data Management, Disks, EMC, Filesystems, HDS, High Performance Computing, NAS, NetApp, Nexenta, NFS, Performance Benchmark, Performance Caching, Reliability, ROBO, SATA, Scale-out architecture, SMB, Solid State Devices, Storage Optimization, Storage Tiering

1 Comment

We are all familiar with the concept of data archiving. Passive data gets archived from production storage and are migrated to a slower and often, cheaper storage medium such tapes or SATA disks. Hence the terms nearline and offline data are created. With that, IT constantly reminds users that the archived data is infrequently accessed, and therefore, they have to accept the slower access to passive, archived data.

The business conditions have certainly changed, because the need for data to be 100% online is becoming more relevant. The new competitive nature of businesses dictates that data must be at the fingertips, because speed and agility are the new competitive advantage. Often the total amount of data, production and archived data, is into hundred of TBs, even into PetaBytes!

The industries I am familiar with – Oil & Gas, and Media & Entertainment – are facing this situation. These industries have a deluge of files, and unstructured data in its archive, and much of it dormant, inactive and sitting on old tapes of a bygone era. Yet, these files and unstructured data have the most potential to be explored, mined and analyzed to realize its value to the organization. In short, the archived data and files must be democratized!

The flip side is, when the archived files and unstructured data are coupled with a slow access interface or unreliable storage infrastructure, the value of archived data is downgraded because of the aggravated interaction between access and applications and business requirements. How would organizations value archived data more if the access path to the archived data is so damn hard???!!!

An interesting solution fell upon my lap some months ago, and putting A and B together (A + B), I believe the access path to archived data can be unbelievably of high performance, simple, transparent and most importantly, remove the BLOODY PAIN of FILE AND DATA MIGRATION! For storage administrators and engineers familiar with data migration, especially if the size of the migration is into hundreds of TBs or even PBs, you know what I mean!

I have known this solution for some time now, because I have been avidly following its development after its founders left NetApp following their Spinnaker venture to start Avere Systems.

Continue reading →

Praying to the hypervisor God

By cfheoh | October 5, 2014 - 2:25 pm |October 5, 2014 10Gigabit Ethernet, API, Appliance, Cisco, Cloud, Datacore, Deduplication, Dell, Disks, EMC, Filesystems, Gartner, HDS, HP, IBM, Microsoft, NetApp, Nexenta, Nutanix, NVMe, Open Compute Project, Openstack, Oracle, PCIe, Scale-out architecture, ScaleMP, Server SAN, Simplivity, SNIA, Software Defined Storage, Software-defined Datacenter, Solaris, Solid State Devices, Storage Market Share, Storage Optimization, Storage Tiering, Tintri, Violin Memory, Virident, Virsto, Virtualization, VMware

1 Comment

I was reading a great article by Frank Denneman about storage intelligence moving up the stack. It was pretty much in line with what I have been observing in the past 18 months or so, about the storage pendulum having swung back to DAS (direct attached storage). To be more precise, the DAS form factor I am referring to are physical server hardware that houses many disk drives.

Like it or not, the hypervisor has become the center of the universe in the IT space. VMware has become the indomitable force in the hypervisor technology, with Microsoft Hyper-V playing catch-up. The seismic shift of these 2 hypervisor technologies are leading storage vendors to place them on to the altar and revering them as deities. The others, with the likes of Xen and KVM, and to lesser extent Solaris Containers aren’t really worth mentioning.

This shift, as the pendulum swings from networked storage back to internal “direct-attached” storage are dictated by 4 main technology factors:

The x86 server architecture
Software-defined
Scale-out architecture
Flash-based storage technology

Anyone remember Thumper? Not the Disney character from the Bambi movie!

When the SunFire X4500 (aka Thumper) was first released in (intermission: checking Wiki for the right year) in 2006, I felt that significant wound inflicted in the networked storage industry. Instead of the usual 4-8 hard disk drives in the all the industry servers at the time, the X4500 4U chassis housed 48 hard disk drives. The design and architecture were so astounding to me, I even went and bought a 1U SunFire X4150 for my personal server collection. Such was my adoration for Sun’s technology at the time.

Continue reading →

No Flash in the pan

By cfheoh | June 9, 2014 - 10:29 am |June 9, 2014 10Gigabit Ethernet, Cloud, Data, Data Availability, Datacore, Disks, Filesystems, Microsoft, NAS, NVMe, PCIe, Performance Benchmark, Performance Caching, Reliability, SATA, SCSI, Server SAN, Software Defined Storage, Solid State Devices, Storage Optimization, Unified Storage, VDI, Virident, Virsto, Virtualization, VMware

(picture courtesy of http://electronicdesign.com/memory/evolution-solid-state-storage-enterprise-servers)

Right at the top, we have the CPU/Memory complex (labelled as Processor). Our applications, albeit bytes and pieces of them, run in this CPU/Memory complex.

Therefore, we can see Pattern #1 showing up. Continue reading →

Category Archives: 10Gigabit Ethernet

How well do you know your data and the storage platform that processes the data

I/O Characteristics

A Paean to NFS

The Network is Still the Computer

The rise of RDMA

The engineering of Elastifile

Considerations of Hadoop in the Enterprise

The reverse wars – DAS vs NAS vs SAN

Why demote archived data access?

No Flash in the pan

(picture courtesy of http://electronicdesign.com/memory/evolution-solid-state-storage-enterprise-servers)

Recent Posts

Sponsored Ads

Google Adsense

Recent Comments

Google Adsense

I/O Characteristics

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

(picture courtesy of http://electronicdesign.com/memory/evolution-solid-state-storage-enterprise-servers)

Share this:

Recent Posts

Sponsored Ads

Google Adsense

Recent Comments

Google Adsense