Learn more about Filebase’s integration with the IPFS network.
What is IPFS?
Interplanetary File System, also known as IPFS, is a decentralized peer-to-peer protocol for storing and retrieving files or websites.
IPFS is similar to HTTP, a protocol that is at the heart of how we use and create content on the internet currently. IPFS is relatively new and has a wide range of different attributes and benefits in comparison to HTTP.
Note: HTTP in this context refers to both HTTP and HTTPS. HTTPS should be used for all production environments for the most security and reliability of the content.
To bridge the gap between the two protocols, IPFS HTTP gateways combine the two protocols, allowing you to use and build with IPFS by accessing the IPFS network through HTTP requests.
HTTP and The Client-Server Model
Traditionally, when you access a webpage there are multiple protocols working together to deliver this website to you. First, the DNS protocol finds the IP address of the server that is tied to the domain name. Then, HTTP is used to request the website from the host server. This workflow is referred to as the client-server model.
While the client-server model is at the forefront of how we interact with and use the internet today, this model is centralized by design and therefore comes with risks such as unreliability, lack of resilience, and single points of failure. The client-server model puts all responsibility on the host server to ensure that the website is constantly available and accessible. If the host server is down due to an outage, disaster, or hardware failure, the website becomes unreachable and inaccessible.
Most notably, the HTTP protocol only sends your request for the website to the host server and doesn’t send your request to other servers that might be able to respond if the host server is down. This is a fundamental difference between HTTP and IPFS.
What About Centralized Cloud Providers?
Another model for website and file hosting or storage is to use cloud providers, like AWS or Google Cloud. This is a common workflow for many applications, websites, and platforms where cloud providers create redundancy and high availability by deploying these assets over multiple servers through high-level abstractions such as CDNs or storage services. Typically, though, these servers are all located in one geographical location, most of the time within the same server rack or row. This means that while there might be more redundancy on the protocol layer, there isn’t redundancy for data center-wide outages, disasters, or human error like a cable being disconnected by accident.
The cloud provider solutions are proprietary to each cloud provider, meaning they are not standardized, open-source, or interoperable due to these solutions being deployed at the hardware or software layers instead of on the protocol layer. This creates a hard vendor lock-in that can trap customers into staying with one provider, even if it isn’t benefiting them as much as another provider could.
Another problem with big cloud providers is that since each cloud provider has such a large market concentration, outages or disasters often have a detrimental effect when something happens like a hardware failure, fire, or even a human error. Outages of this size often affect thousands of websites, services, and platforms, which can bring down even more services if they rely on any of the websites brought down in the outage.
The Peer-To-Peer Solution
IPFS is a peer-to-peer network protocol, meaning instead of each client is connected to a host server like in the client-server model, each client (also referred to as a peer or node) is connected to every other peer to allow it to act as both the client and the server simultaneously. With this configuration, any peer can serve any requested file or website and be a productive member of the network to provide high availability, reliability, and resiliency to network outages or disruptions. With IPFS, peers are able to pool their resources such as storage space or internet bandwidth to ensure that files are always available, resilient to outages, and most importantly, decentralized.
How does IPFS work?
IPFS is unique from other decentralized storage networks because it offers additional features and attributes such as content addressing, directed acyclic graphics (DAGs), and distributed hash tables (DHTs).
Unique Data Identification via Content Addressing
Data stored on IPFS is located through its content address rather than its physical location. When data is stored on IPFS, it is stored in a series of encrypted pieces, with each piece having its own unique content identifier or hash. This hash serves as an identifier and links the piece to all the other pieces of that data.
Identifying an object, such as an object or a node, by the value of its hash is referred to as content addressing. The hash identifier is known as the Content Identifier or CID. When objects are uploaded to an IPFS bucket on Filebase, the IPFS CID is listed in the object’s metadata for easy reference in any tool or application, or for use with an IPFS gateway.
Learn more about IPFS CIDs in our deep dive document about CIDs below:
Content Linking via Directed Acyclic Graphs (DAGs)
Directed acyclic graphs (DAGs) are a hierarchical data structure. A graph is a way to display objects and the relationship between them. A directed graph is when a graph’s edges have direction, as depicted in the photo above. An acyclic graph is a graph where the edges have definitive ends and do not create a loop to other objects. Think of a family tree that shows ancestors and their relationship to one another. This is a good example of a directed acyclic graph.
In this context, an object in a graph is referred to as a node and an edge refers to the relation between the objects in a graph.
IPFS uses Merkle DAGs, where each node has a unique identifier that is the result of hashing the node’s contents. Merkle DAGs are a form of self-verified data structures.
Content Discovery through Distributed Hash Tables (DHTs)
A distributed hash table (DHT) is a distributed system for mapping keys to their associated values. DHTs are databases of keys and values that are split across all the peers on a distributed network. To locate content, you ask a peer on the network, which will return a DHT that tells you which peers are storing which blocks of content that make up the data object you’re requesting.
Content Addressing vs. Location Addressing
Location addressing is used by client-server models to address content based on its location on the internet, typically through IP addresses. Location addresses have three parts, which are combined into a URL. These parts are:
Scheme: This refers to the protocol being used to serve the address, which typically is HTTPS.
Hostname: This refers to the domain name mapped to the IP address of the server, such as google.com
Path: This refers to the location of the file on the server, such as /assets/images/image.png.
Altogether, a URL typically looks like this:
Location addressing can cause issues with serving content if the URL changes in any way, such as if the file name changes or the file path gets adjusted. Sometimes, the server may not even be hosting the requested file anymore, and you’re brought to a broken webpage or image.
IPFS uses content addressing, since a file can be hosted simultaneously across different IPFS peers, trying to identify it by one location can be counterintuitive.
Content addressing is when a file stored on a peer-to-peer network is addressed by the cryptographic hash of the file’s contents. In IPFS, this cryptographic hash is known as the content identifier or CID. A CID is a string of numbers and letters unique to the cryptographic hash of the file or folder’s contents.
If a file is uploaded multiple times to IPFS, if the content of that file has not changed, it will return the same CID each time it has been uploaded.
Any change to the file’s contents at all will provide a different CID when uploaded. This assures that files uploaded to IPFS are immutable since any changes will produce a new, unique content identifier.
A single CID can represent either a single file or a folder of files, like a folder containing the files for a static website.
Through content addressing and content identifiers, any IPFS peer can retrieve any given CID as long as the file is being served by at least one peer on the network. For example, if you request a CID from an IPFS peer and that peer is not hosting the file, it will search the entire IPFS network for the peer that has the file. Once it finds the peer, it will fetch the file associated with the CID and return it back to you.
To assure that at least one peer is hosting the file, IPFS offers a feature known as pinning. IPFS pinning refers to the process of specifying data to be retained and persist on one or more IPFS nodes. Pinning assures that data is accessible indefinitely, and will not be removed during the IPFS garbage collection process.
When files and data are stored on the IPFS network, nodes on the network cache the files that they download and keep those files available for other nodes on the network. Since storage on these nodes is finite, the cache for each node must be cleared periodically to make room for new files to be cached and made available. The process of clearing the cache for IPFS nodes is referred to as the IPFS garbage collection process.
The Filebase IPFS Pinning Service
Filebase offers an IPFS pinning service, where all files uploaded to a Filebase IPFS bucket are automatically pinned on 3 different IPFS nodes within the Filebase infrastructure. This ensures 3x redundancy for all IPFS pinned files on the Filebase infrastructure for high availability, resiliency, and reliability.
Some pinning service providers don’t publish provider records to the IPFS DHT, or Distributed Hash Table. The DHT is responsible for keeping a distributed system that maps keys to values and maps the CID that a user requests with the peer that is hosting that content. The DHT is essentially a large table storing CIDs with their associated storage peer. Filebase publishes our provider records to the IPFS DHT for the best performance and CID retrieval time when requested.
Learn more about how to pin files to IPFS with Filebase here: