IPFS

Learn more about Filebase’s integration with the IPFS network.

What is IPFS?

The InterPlanetary File System, or IPFS, is a decentralized peer-to-peer protocol that enables nodes to store and transfer files between one another.

IPFS isn’t inherently a network itself, like other decentralized storage networks such as Sia. IPFS is a communication protocol that outlines the workflow and components that facilitate the IPFS network to exist. Software such as the IPFS Desktop client or IPFS CLI daemon provides IPFS nodes the ability to interact with other nodes running the same software, in return creating a network of peers that are storing and sharing files between themselves.

This section provides a brief overview of IPFS and how it works. For a detailed, technical explanation of IPFS, please review our IPFS Whitepaper:

IPFS vs HTTP

IPFS is similar to HTTP, a protocol that is at the heart of how we use and create content on the internet currently. IPFS is relatively new and has a wide range of different attributes and benefits in comparison to HTTP.

Note: HTTP in this context refers to both HTTP and HTTPS. HTTPS should be used for all production environments for the most security and reliability of the content.

To bridge the gap between the two protocols, IPFS HTTP gateways combine the two protocols, allowing you to use and build with IPFS by accessing the IPFS network through HTTP requests.

HTTP: The Client-Server Model

Traditionally, when you access a webpage there are multiple protocols working together to deliver this website to you. First, the DNS protocol finds the IP address of the server that is tied to the domain name. Then, HTTP is used to request the website from the host server. This workflow is referred to as the client-server model.

While the client-server model is at the forefront of how we interact with and use the internet today, this model is centralized by design and therefore comes with risks such as unreliability, lack of resilience, and single points of failure. The client-server model puts all responsibility on the host server to ensure that the website is constantly available and accessible. If the host server is down due to an outage, disaster, or hardware failure, the website becomes unreachable and inaccessible.

Most notably, the HTTP protocol only sends your request for the website to the host server and doesn’t send your request to other servers that might be able to respond if the host server is down. This is a fundamental difference between HTTP and IPFS.

What About Centralized Cloud Providers?

Another model for website and file hosting or storage is to use cloud providers, like AWS or Google Cloud. This is a common workflow for many applications, websites, and platforms where cloud providers create redundancy and high availability by deploying these assets over multiple servers through high-level abstractions such as CDNs or storage services. Typically, though, these servers are all located in one geographical location, most of the time within the same server rack or row. This means that while there might be more redundancy on the protocol layer, there isn’t redundancy for data center-wide outages, disasters, or human error like a cable being disconnected by accident.

The cloud provider solutions are proprietary to each cloud provider, meaning they are not standardized, open-source, or interoperable due to these solutions being deployed at the hardware or software layers instead of on the protocol layer. This creates a hard vendor lock-in that can trap customers into staying with one provider, even if it isn’t benefiting them as much as another provider could.

Another problem with big cloud providers is that since each cloud provider has such a large market concentration, outages or disasters often have a detrimental effect when something happens like a hardware failure, fire, or even a human error. Outages of this size often affect thousands of websites, services, and platforms, which can bring down even more services if they rely on any of the websites brought down in the outage.

IPFS: The Peer-To-Peer Solution

IPFS is a peer-to-peer communication protocol, meaning instead of each client being connected to a host server like in the client-server model, each client (also referred to as a peer or node) is connected to every other peer to allow it to act as both the client and the server simultaneously. With this configuration, any peer can serve any requested file or website and be a productive member of the network to provide high availability, reliability, and resiliency to network outages or disruptions. With IPFS, peers are able to pool their resources such as storage space or internet bandwidth to ensure that files are always available, resilient to outages, and most importantly, decentralized.

How does IPFS work?

IPFS is unique from other decentralized storage networks because it offers additional features and attributes such as content addressing, directed acyclic graphics (DAGs), and distributed hash tables (DHTs).

Unique Data Identification via Content Addressing

Data stored on IPFS is located through its content address rather than its physical location. When data is stored on IPFS, it is stored in a series of encrypted pieces, with each piece having its own unique content identifier or hash. This hash serves as an identifier and links the piece to all the other pieces of that data.

Identifying an object, such as an object or a node, by the value of its hash is referred to as content addressing. The hash identifier is known as the Content Identifier or CID. When objects are uploaded to an IPFS bucket on Filebase, the IPFS CID is listed in the object’s metadata for easy reference in any tool or application, or for use with an IPFS gateway.

Learn more about IPFS CIDs in our deep dive documents about CIDs below:

pageIPFS CIDs

Content Linking via Directed Acyclic Graphs (DAGs)

Directed acyclic graphs (DAGs) are a hierarchical data structure. A graph is a way to display objects and the relationship between them. A directed graph is when a graph’s edges have direction, as depicted in the photo above. An acyclic graph is a graph where the edges have definitive ends and do not create a loop to other objects. Think of a family tree that shows ancestors and their relationship to one another. This is a good example of a directed acyclic graph.

In this context, an object in a graph is referred to as a node and an edge refers to the relation between the objects in a graph.

IPFS uses Merkle DAGs, where each node has a unique identifier that is the result of hashing the node’s contents. Merkle DAGs are a form of self-verified data structures.

Content Discovery through Distributed Hash Tables (DHTs)

A distributed hash table (DHT) is a distributed system for mapping keys to their associated values. DHTs are databases of keys and values that are split across all the peers on a distributed network. To locate content, you ask a peer on the network, which will return a DHT that tells you which peers are storing which blocks of content that make up the data object you’re requesting.

Content Addressing vs. Location Addressing

Location addressing is used by client-server models to address content based on its location on the internet, typically through IP addresses. Location addresses have three parts, which are combined into a URL. These parts are:

  • Scheme: This refers to the protocol being used to serve the address, which typically is HTTPS.

  • Hostname: This refers to the domain name mapped to the IP address of the server, such as google.com

  • Path: This refers to the location of the file on the server, such as /assets/images/image.png.

Altogether, a URL typically looks like this:

https://google.com/assets/images/image.png

Location addressing can cause issues with serving content if the URL changes in any way, such as if the file name changes or the file path gets adjusted. Sometimes, the server may not even be hosting the requested file anymore, and you’re brought to a broken webpage or image.

IPFS uses content addressing, since a file can be hosted simultaneously across different IPFS peers, trying to identify it by one location can be counterintuitive.

Content addressing is when a file stored on a peer-to-peer network is addressed by the cryptographic hash of the file’s contents. In IPFS, this cryptographic hash is known as the content identifier or CID. A CID is a string of numbers and letters unique to the cryptographic hash of the file or folder’s contents.

If a file is uploaded multiple times to IPFS, if the content of that file has not changed, it will return the same CID each time it has been uploaded.

Any change to the file’s contents at all will provide a different CID when uploaded. This assures that files uploaded to IPFS are immutable since any changes will produce a new, unique content identifier.

A single CID can represent either a single file or a folder of files, like a folder containing the files for a static website.

How To Create An IPFS Bucket

Simply navigate to the Filebase Dashboard Console, create a new bucket and choose the IPFS option.

Native IPFS URLs

Applications that natively support IPFS content addressing can refer to content stored on IPFS in the format:

ipfs://{CID}/{optional path to resource}

This format doesn’t work for applications or tools that rely on HTTP, such as Curl or Wget. For these tools, you need to use an IPFS gateway.

IPFS Gateways

Content stored on IPFS can be accessed by using an IPFS gateway. Gateways are used to provide workarounds for applications that don’t natively support IPFS.

For more information on IPFS gateways, see below.

pageIPFS Gateways

IPFS Pinning

Through content addressing and content identifiers, any IPFS peer can retrieve any given CID as long as the file is being served by at least one peer on the network. For example, if you request a CID from an IPFS peer and that peer is not hosting the file, it will search the entire IPFS network for the peer that has the file. Once it finds the peer, it will fetch the file associated with the CID and return it back to you.

To assure that at least one peer is hosting the file, IPFS offers a feature known as pinning. IPFS pinning refers to the process of specifying data to be retained and persist on one or more IPFS nodes. Pinning assures that data is accessible indefinitely, and will not be removed during the IPFS garbage collection process.

When files and data are stored on the IPFS network, nodes on the network cache the files that they download and keep those files available for other nodes on the network. Since storage on these nodes is finite, the cache for each node must be cleared periodically to make room for new files to be cached and made available. The process of clearing the cache for IPFS nodes is referred to as the IPFS garbage collection process.

The Filebase IPFS Pinning Service

Filebase offers an IPFS pinning service, where all files uploaded to a Filebase IPFS bucket are automatically pinned on 3 different IPFS nodes within the Filebase infrastructure. This ensures 3x redundancy for all IPFS pinned files on the Filebase infrastructure for high availability, resiliency, and reliability.

Some pinning service providers don’t publish provider records to the IPFS DHT, or Distributed Hash Table. The DHT is responsible for keeping a distributed system that maps keys to values and maps the CID that a user requests with the peer that is hosting that content. The DHT is essentially a large table storing CIDs with their associated storage peer. Filebase publishes our provider records to the IPFS DHT for the best performance and CID retrieval time when requested.

Learn more about how to pin files to IPFS with Filebase here:

pageIPFS Pinning

You can sign up for a free Filebase account to get started with IPFS today.

If you have any questions, please join our Discord server, or send us an email at hello@filebase.com

Last updated