What is IPFS?
Learn more about IPFS, the peer-to-peer distributed file system
Last updated
Learn more about IPFS, the peer-to-peer distributed file system
Last updated
The InterPlanetary File System, or IPFS, is a decentralized peer-to-peer protocol that enables nodes to store and transfer files between one another.
IPFS isn’t inherently a network itself, like other decentralized storage networks such as Sia. IPFS is a communication protocol that outlines the workflow and components that facilitate the IPFS network to exist. Software such as the IPFS Desktop client or IPFS CLI daemon provides IPFS nodes the ability to interact with other nodes running the same software, in return creating a network of peers that are storing and sharing files between themselves.
IPFS is similar to HTTP, a protocol that is at the heart of how we use and create content on the internet currently. IPFS is relatively new and has a wide range of different attributes and benefits in comparison to HTTP.
Note: HTTP in this context refers to both HTTP and HTTPS. HTTPS should be used for all production environments for the most security and reliability of the content.
To bridge the gap between the two protocols, IPFS HTTP gateways combine the two protocols, allowing you to use and build with IPFS by accessing the IPFS network through HTTP requests.
Traditionally, when you access a webpage there are multiple protocols working together to deliver this website to you. First, the DNS protocol finds the IP address of the server that is tied to the domain name. Then, HTTP is used to request the website from the host server. This workflow is referred to as the client-server model.
While the client-server model is at the forefront of how we interact with and use the internet today, this model is centralized by design and therefore comes with risks such as unreliability, lack of resilience, and single points of failure. The client-server model puts all responsibility on the host server to ensure that the website is constantly available and accessible. If the host server is down due to an outage, disaster, or hardware failure, the website becomes unreachable and inaccessible.
Most notably, the HTTP protocol only sends your request for the website to the host server and doesn’t send your request to other servers that might be able to respond if the host server is down. This is a fundamental difference between HTTP and IPFS.
Another model for website and file hosting or storage is to use cloud providers, like AWS or Google Cloud. This is a common workflow for many applications, websites, and platforms where cloud providers create redundancy and high availability by deploying these assets over multiple servers through high-level abstractions such as CDNs or storage services. Typically, though, these servers are all located in one geographical location, most of the time within the same server rack or row. This means that while there might be more redundancy on the protocol layer, there isn’t redundancy for data center-wide outages, disasters, or human error like a cable being disconnected by accident.
The cloud provider solutions are proprietary to each cloud provider, meaning they are not standardized, open-source, or interoperable due to these solutions being deployed at the hardware or software layers instead of on the protocol layer. This creates a hard vendor lock-in that can trap customers into staying with one provider, even if it isn’t benefiting them as much as another provider could.
Another problem with big cloud providers is that since each cloud provider has such a large market concentration, outages or disasters often have a detrimental effect when something happens like a hardware failure, fire, or even a human error. Outages of this size often affect thousands of websites, services, and platforms, which can bring down even more services if they rely on any of the websites brought down in the outage.
IPFS is a peer-to-peer communication protocol, meaning instead of each client being connected to a host server like in the client-server model, each client (also referred to as a peer or node) is connected to every other peer to allow it to act as both the client and the server simultaneously. With this configuration, any peer can serve any requested file or website and be a productive member of the network to provide high availability, reliability, and resiliency to network outages or disruptions. With IPFS, peers are able to pool their resources such as storage space or internet bandwidth to ensure that files are always available, resilient to outages, and most importantly, decentralized.
IPFS is unique from other decentralized storage networks because it offers additional features and attributes such as content addressing, directed acyclic graphics (DAGs), and distributed hash tables (DHTs).
Data stored on IPFS is located through its content address rather than its physical location. When data is stored on IPFS, it is stored in a series of encrypted pieces, with each piece having its own unique content identifier or hash. This hash serves as an identifier and links the piece to all the other pieces of that data.
Identifying an object, such as an object or a node, by the value of its hash is referred to as content addressing. The hash identifier is known as the Content Identifier or CID. When objects are uploaded to an IPFS bucket on Filebase, the IPFS CID is listed in the object’s metadata for easy reference in any tool or application, or for use with an IPFS gateway.
Learn more about IPFS CIDs in our deep dive documents about CIDs below:
Directed acyclic graphs (DAGs) are a hierarchical data structure. A graph is a way to display objects and the relationship between them. A directed graph is when a graph’s edges have direction, as depicted in the photo above. An acyclic graph is a graph where the edges have definitive ends and do not create a loop to other objects. Think of a family tree that shows ancestors and their relationship to one another. This is a good example of a directed acyclic graph.
In this context, an object in a graph is referred to as a node and an edge refers to the relation between the objects in a graph.
IPFS uses Merkle DAGs, where each node has a unique identifier that is the result of hashing the node’s contents. Merkle DAGs are a form of self-verified data structures.
A distributed hash table (DHT) is a distributed system for mapping keys to their associated values. DHTs are databases of keys and values that are split across all the peers on a distributed network. To locate content, you ask a peer on the network, which will return a DHT that tells you which peers are storing which blocks of content that make up the data object you’re requesting.
Location addressing is used by client-server models to address content based on its location on the internet, typically through IP addresses. Location addresses have three parts, which are combined into a URL. These parts are:
Scheme: This refers to the protocol being used to serve the address, which typically is HTTPS.
Hostname: This refers to the domain name mapped to the IP address of the server, such as google.com
Path: This refers to the location of the file on the server, such as /assets/images/image.png.
Altogether, a URL typically looks like this:
https://google.com/assets/images/image.png
Location addressing can cause issues with serving content if the URL changes in any way, such as if the file name changes or the file path gets adjusted. Sometimes, the server may not even be hosting the requested file anymore, and you’re brought to a broken webpage or image.
IPFS uses content addressing, since a file can be hosted simultaneously across different IPFS peers, trying to identify it by one location can be counterintuitive.
Content addressing is when a file stored on a peer-to-peer network is addressed by the cryptographic hash of the file’s contents. In IPFS, this cryptographic hash is known as the content identifier or CID. A CID is a string of numbers and letters unique to the cryptographic hash of the file or folder’s contents.
If a file is uploaded multiple times to IPFS, if the content of that file has not changed, it will return the same CID each time it has been uploaded.
Any change to the file’s contents at all will provide a different CID when uploaded. This assures that files uploaded to IPFS are immutable since any changes will produce a new, unique content identifier.
A single CID can represent either a single file or a folder of files, like a folder containing the files for a static website.
Applications that natively support IPFS content addressing can refer to content stored on IPFS in the format:
ipfs://{CID}/{optional path to resource}
This format doesn’t work for applications or tools that rely on HTTP, such as Curl or Wget. For these tools, you need to use an IPFS gateway.
Content stored on IPFS can be accessed by using an IPFS gateway. Gateways are used to provide workarounds for applications that don’t natively support IPFS.
For more information on IPFS gateways, see below.
What is an IPFS Gateway?IPFS pinning refers to the process of specifying data to be retained and persisted on one or more IPFS nodes. Pinning assures that data is accessible indefinitely, and will not be removed during the IPFS garbage collection process.
For more information on IPFS pinning, see below.
What is IPFS Pinning?Filebase offers an IPFS pinning service, where all files uploaded into a Filebase IPFS bucket are automatically pinned to the IPFS network.
Learn more about how to pin files to IPFS with Filebase here:
What is IPFS Pinning?