First looks into Interplanetary File System

The cryptocurrency craze has elevated another strong candidate in recent months. Filecoin, is leading the voice of a decentralized Internet, the next generation Web 3.0. In this blog, I am not going to write much about the Filecoin frenzy but the underlying distributed file system that powers this phenomenon – The Interplanetary File System.

[ Note: This is still a very new area for me, and the rest of the content of this blog is still nascent and developing ]

Interplanetary File System

Tremulous Client-Server web architecture

The entire Internet architecture is almost client and server. Your clients like browsers, apps, connect to Web services served from a collection of servers. As Web 3.0 approaches (some say it is already here), the client-server model is no longer perceived as the Internet architecture of choice. Billions, and billions of users, applications, devices relying solely on a centralized service would lead to many impactful consequences, and the reasons for decentralization, away from the client-server architecture models of the Internet are cogent.

This reliance on centralization can lead to dire consequences, and heck, just last week, some of us read about the WD My Book Live NAS devices executed a factory reset and deleted all data on the devices. Whether this was en masse or not, this showed one of the burgeoning vulnerabilities of a centralized client-service architecture.

IPFS decentralized storage network

Interplanetary File System (IPFS) is a decentralized storage network where clients can store and retrieve data, just like what we do now with Google® Photos, or Dropbox®, and many other cloud storage services today. However, clients use IPFS’ protocol put and get (similar to HTTP/S put and get) operations to the distributed peer-to-peer storage network. The clients, using the diagram below as reference, are the IPFS cubes on the top left corner (Add Data = Storing Data using ipfs put) and bottom right corner (Request Data = Retrieving Data using ipfs get).

Interplanetary File System (IPFS) Peer-to-Peer Network

In the middle circular structure of the diagram is the P2P network where the IPFS nodes exchange information and sharing data using the IPFS protocols. Each of the nodes holds the one or more structures (either in part or in entirety) of a few distributed hash tables (DHT) housing key-value information using a lookup table that links each and every object with the content identifier, the CID (the key) and the hashed value of the object. For simplicity sake, each node is considered a Merkel Directed Acyclic Graph (DAG) node, an implementation variant of the Merkel Tree, and is immutable.

From another angle, the diagram below shows how clients [ in (2) ], interacting with the IPFS P2P Network [ in (3) ], where each IPFS cube (called a Merkel DAG node), housing and dynamically updating the Distributed Hash Table, DHT [ in (4) ] as everything from objects, key-value pairs, nodes, changes continuously.

The blockchain aspect [ in (1) ] is not discussed in the blog, but is the foundational storage and retrieval incentivization system for Filecoin. Terms like storage miners and retrieval miners are now coming out from the technobabble into the mainstream speak of the cryptocurrency players.

IPFS DHT Key-to-Block Storage example

Content-based addressing

Now that I have explained lightly about IPFS decentralized storage network, we now look at IPFS hashing, which is shown at (5) in the diagram above.

Using the client-server architecture model mentioned earlier, the client must be able to identify and recognize the location of the server. This could be a URL (Universal Resource Locator) for HTTP/S such as https://www.example.com. The domain name www.example.com is an FQDN (fully qualified domain name) resolved by a DNS (Domain Name System), which translates the human readable www.example.com into an IP address. Thus, the Internet ecosystem, pretty much for decades, runs on a location-based addressing system.

Using the BitTorrent concept, files are shared via the P2P network protocol as described in IPFS. The files, or in the IPFS speak, the data objects are sharded and hashed into a Merkel DAG. The diagram below shows how a cat picture is sharded into multiple commit objects. A CID content identifier is computed for tagging and identifying the content.

IPFS sharding and hashing a cat photo

Each commit object is 256KB, and data larger than 256KB will sharded into multiple chunks of the content, and then chained through metadata links sequentially from objects to objects to objects, until the last piece that is smaller than 256KB.

The CID content identifier represents the object and once requested (either via ipfs ls, or ipfs add or ipfs pin rm), will recollect and retrieve the content from the most relevant assembled “address label” of the content. This is IPFS content addressing. And this is where it is different from location-based addressing in many client-server architectures. With the content address, the client can get to the data objects without the requirement to know the location of the data objects are. This removes the centralization of a data storage service provider and partitioning the entire content across the distributed decentralized IPFS network. This leads to inherent advantages such as deduplication of contents (further space savings), versioning (data permanence), immutability and most of all, security.

Decentralization beyond Cryptocurrencies

Cryptocurrencies based Proof-of-Space (PoS) and Proof-of-Space-Time (PoST) have been gaining in popularity in the past year, reaching crescendo in the last 2 months. Chia coin has been the loudest, and I blogged about it a few weeks ago. But Filecoin’s ICO (initial coin offering) should not discounted as well, not just of its financial prospects, because the Internet is now become a realm of social and political contention. These are further exacerbated acutely by the rampant, ever more devastating impacts of ransomware and cybersecurity incidents of late.

With these in mind, decentralization presents a strong case to supplant the current state of the Internet. IPFS is in the leader pack of the decentralization conversation.

Tagged , , , , , , , , , , , , , , , , , . Bookmark the permalink.

About cfheoh

I am a technology blogger with 25+ years of IT experience. I write heavily on technologies related to storage networking and data management because that is my area of interest and expertise. I introduce technologies with the objectives to get readers to *know the facts*, and use that knowledge to cut through the marketing hypes, FUD (fear, uncertainty and doubt) and other fancy stuff. Only then, there will be progress. I am involved in SNIA (Storage Networking Industry Association) and as of October 2013, I have been appointed as SNIA South Asia & SNIA Malaysia non-voting representation to SNIA Technical Council. I currently run a small system integration and consulting company focusing on storage and cloud solutions, with occasional consulting work on high performance computing (HPC).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.