Deep Dive: Decentralized Storage
Learn about what decentralized storage is, how it works, and how to transition from centralized storage to decentralized storage.
To understand decentralized storage, you first need to understand how centralized storage works.
Centralized storage is the type of storage that most people currently use in their daily lives. Mobile phones, laptops, and tablets are all forms of centralized storage. The hard drives or SD cards in these devices are forms of centralized storage since everything stored on these devices is stored in one place with one copy.
Data centers are also forms of centralized storage. Data files that are stored on servers housed in data centers are stored in one geographical location on one single server in the data center. Data is not spread amongst different servers within the data center unless explicit data replication is configured, such as RAID, but even then each copy of each data file is stored in one location.
This means that when data from devices such as phones or laptops are backed up to cloud storage, while there might be two copies of the data, they are both stored in two different forms of centralized storage. If something happens to your phone that compromises or destroys the data, then you have to rely on the backup to retrieve your data. If something has happened to that data backup, such as it being destroyed by a fire, natural disaster, or if it's affected by an outage, your data is inaccessible despite the fact you were diligent and backed it up in case of these situations.
The problem with centralized storage is if something happens to it, it’s gone unless you have a backup plan and active backup method. If you’ve lost your phone and never backed it up to iCloud, or if you did and it was backed up once 8 months before you lost the phone, all the data on that phone that’s been added since that backup is gone since it was stored in a centralized location. Then if you go to iCloud and try to retrieve the backed-up data, if it's inaccessible or corrupt you’ve lost not just 8 months of data, but everything.
This is a huge weakness of centralized storage since despite how vigilant you are in backing up your data regularly, it can still be lost if the cloud storage provider is hit by an outage or disaster.
Decentralized storage can be visualized as the backbone of the Web3 ecosystem. To use or view data, it must be stored somewhere that it can be retrieved from. Data must be stored somewhere to be retrieved, edited, viewed, or used. Keeping data in a centralized manner directly goes against the values of Web3, which include decentralization at its core.
Decentralized storage networks are peer-to-peer networks made up of nodes that provide storage resources for the network to use for transactions and data storage. Each node on the network is an individual entity, such as a home computer or a dedicated server, that has been added to the network using unique software. Using this configuration, decentralized storage networks can utilize data that already exists across the globe that is otherwise unused. Since already existing storage is used, there are no additional costs regarding adding new hard drives, building and maintaining data centers, or employing data center employees. As a result, decentralized storage can be offered at a significantly cheaper price than storage provided by a centralized provider.
On a decentralized storage network, files are split into multiple chunks using erasure coding technology, then stored across the node on the network worldwide. A node never has access to the complete pieces of a file, providing an innate layer of security. The only way for a file to be retrieved is if the user who uploaded the file requests that the file be downloaded. The file is then reconstructed using pieces stored across the globe, then sent to the user to be viewed or downloaded.
The best way to visualize how decentralized storage works is to think about how online orders are processed and shipped.
Say you place an order on the website Chewy, a pet supplies marketplace with warehouses all over the United States. Your order contains three different items; a dog toy, a dog treat, and some dog food. Since there are warehouses all over the country, it's unlikely that each warehouse has all three of these items in stock at the same time. To fulfill your order, each item gets shipped from whatever warehouse has the item in stock.
When you store a file on a decentralized storage network, the file gets broken apart into a number of different pieces. That number varies based on what network you’re using, if you store on the Sia network, the file gets broken into 30 pieces. This process is called erasure coding. Each piece is then individually encrypted with a special algorithm, then stored across the world in a wide variety of locations, which also will vary based on the network you store data on. When you go to access or download your file, it's like when your order gets shipped from Chewy - your file is pieced together from each location it's stored at, then sent to you. But unlike a Chewy order that takes a few days to arrive, your file is ready to access in just a few seconds.
Below are the steps that happen when a user goes directly to a decentralized storage network, such as Sia, themselves.
- Before a file is uploaded to a decentralized storage network, it is always best practice that the file is encrypted by the user who maintains their own encryption key. This is optional but highly recommended for maximum data security. The user who is uploading the file is known as the ‘renter' since they are considered to be 'renting' the storage from the network.
- The renter uploads the file to the decentralized storage network. The file is divided into multiple pieces, then expanded with parity blocks according to the network’s erasure coding algorithm configuration.
- Each piece of a file is known as a shard, which are then stored across multiple nodes on the decentralized network. These nodes are referred to as ‘providers’ or ‘farmers.
- Each provider that receives a data shard is unable to access or view the content of that shard. This provides protection for the data if a bad actor compromises the node.
- When the user requests their file be downloaded, the data gets reconstructed using the minimum number of shards determined through the erasure coding algorithm. Each provider node sends the shard that they store, which is authenticated using the network’s hash table and the user’s credentials which are often in the format of an access key pair.
- Lastly, the user decrypts their file with their encryption key.
When the file is broken apart into multiple pieces and erasure-coded, not all of those pieces are required to reassemble the file to be accessed or downloaded. This is a feature that’s in place specifically to ensure data integrity. For example, on the Sia network where files are broken into 30 pieces, only 10 of those chunks are needed to be pieced together to access or download that file. That means that ⅔ of the file chunks can be offline, corrupt, or otherwise inaccessible, but you can still access your file. You won’t even know that there are offline or corrupt chunks, it won’t change how you access your file at all. No file can be accessed without the minimum number of other pieces, which only you can access due to that special algorithm that gets applied during the erasure coding process, so there's no concern about the owner of the node that stores a piece of the file being able to access the file.
So going back to our Chewy example, this is like when you place an order, but one of the items in your order has been inventoried incorrectly, so when the warehouse staff goes to pull the item, it's actually out of stock there. The warehouse staff routes your request to another warehouse that has the item, so the item still gets delivered to you. They don’t cancel your entire order for a missing item, they simply send it from another warehouse. It’s the same concept with decentralized storage - if one file chunk can’t be accessed, the file can still be accessed and downloaded with no interruption to you, they just use a chunk stored in another location to piece together the file and send it to you.
One of the biggest advantages is the innate security that comes with decentralized storage. Since each file is erasure coded, encrypted, and stored across the globe, and you need at least ⅓ of the pieces to access the file, files are secure not only in integrity and accessibility but also in data privacy. No one can access the data file’s chunks besides the user who uploaded them. The exception to this is data stored on IPFS. This is because data stored on these networks is public by default since IPFS files are accessed with their content identifier through an IPFS gateway address.
Another advantage of decentralized storage is reliability. Since decentralized storage is just that, decentralized, it has no single point of failure that would take the network down and make the data inaccessible. This means there are no more crippling outages that disrupt workflows or result in a loss of business.
In the past, it has been hard to transition from centralized to decentralized storage, but Filebase is intended to be an easy on-ramp for everyone to make the transition from centralized to decentralized storage. Traditionally, users would have to manage things like contracts and cryptocurrency to use decentralized storage. At Filebase, we manage all of that for you and give you the ability to store data across different decentralized networks. We currently support IPFS and Sia. Filebase also doesn’t impose any restrictions that you’d face when storing directly on any of these decentralized networks, such as minimum file size or data retention limitations.
Filebase is the first S3-compatible decentralized storage platform, which means that almost any product, tool, or piece of code that works with Amazon S3 can be configured with Filebase with extreme ease, which makes the transition for developers and enterprises super seamless, but also for the everyday user. You can use Filebase from our easy-to-use web dashboard, or you can configure your favorite backup tool to point to Filebase. So before Filebase, the transition was hard, but today, we aim to be the on-ramp to help transition from Web2 centralized storage to Web3 decentralized storage.