Development of Decentralized Data Storage Systems
If you're reading this after your AWS S3 bucket caused regulatory issues, or after yet another "we're upgrading our infrastructure" notice from a centralized storage provider — welcome. Decentralized data storage isn't about ideology; it's about specific properties: absence of a single point of failure, content verifiability through content addressing, and the ability to store data without an operator's permission.
In 2024, "decentralized storage" refers to three fundamentally different stacks: IPFS + Filecoin, Arweave, and Storj/Sia (incentivized distributed storage). Each suits its own class of problems.
IPFS + Filecoin: content addressing and storage economics
IPFS isn't storage; it's addressing and transport. The Content Identifier (CID) is a multihash of file contents, computed by default as SHA2-256 through DAG structure (dag-pb with UnixFS). If the file changes — the CID changes. This fundamental property makes IPFS suitable for NFT metadata, verifiable documents, immutable artifacts.
IPFS's problem: pinning. If your node stops pinning content, it disappears from the network. Filecoin adds economics: storage providers earn FIL for storing verified deals.
Working with IPFS Cluster
Production systems require IPFS Cluster — a replication coordinator across multiple IPFS nodes. Minimum configuration: 3 nodes, replication factor 2.
// Example pinning via Cluster REST API
type ClusterPinRequest struct {
CID string `json:"cid"`
ReplicationMin int `json:"replication-min"`
ReplicationMax int `json:"replication-max"`
Name string `json:"name"`
Meta map[string]string `json:"meta"`
}
func PinToCluster(cid string, name string) error {
req := ClusterPinRequest{
CID: cid,
ReplicationMin: 2,
ReplicationMax: 3,
Name: name,
}
body, _ := json.Marshal(req)
resp, err := http.Post(
"http://cluster-api:9094/pins/" + cid,
"application/json",
bytes.NewReader(body),
)
// ...
return err
}
For large file uploads, using a chunking strategy is critical. By default, IPFS uses size-262144 (256 KB chunks), but for videos and large binaries, the rabin chunker provides better deduplication through content-defined boundaries:
ipfs add --chunker=rabin-262144-524288-1048576 large_file.bin
Filecoin Storage Deals
Direct work with Filecoin via Lotus is the low-level path. For production, use Estuary (deprecated) or modern Lighthouse SDK / web3.storage:
import { Web3Storage } from 'web3.storage'
const client = new Web3Storage({ token: process.env.W3S_TOKEN })
async function storeWithReplication(files: File[]): Promise<string> {
const cid = await client.put(files, {
wrapWithDirectory: false,
onRootCidReady: (rootCid) => {
console.log('Root CID:', rootCid) // available before upload completes
},
onStoredChunk: (size) => {
console.log(`Uploaded chunk of ${size} bytes`)
}
})
return cid
}
web3.storage does hot IPFS pinning + cold Filecoin deal automatically. For enterprise — NFT.Storage with similar API but focused on metadata.
Arweave: permanent storage with one-time payment
Arweave offers a different model: pay once, data stored "forever" (endowment fund designed for 200+ years with conservative storage cost assumptions). This fundamentally changes use cases.
When Arweave is the right choice:
- Smart contract source code and ABI (permanent verifiability)
- NFT metadata and media (avoid NFT rot)
- Legal and notarial documents
- Governance protocols and voting results (DAO governance history)
Data in Arweave is a transaction with a data field and tags. Tags are the key to indexing via GraphQL:
import Arweave from 'arweave'
import { WarpFactory } from 'warp-contracts'
const arweave = Arweave.init({
host: 'arweave.net',
port: 443,
protocol: 'https'
})
async function uploadDocument(data: Buffer, mimeType: string, metadata: Record<string, string>) {
const tx = await arweave.createTransaction({ data })
tx.addTag('Content-Type', mimeType)
tx.addTag('App-Name', 'YourDApp')
tx.addTag('Version', '1.0.0')
// Custom tags for GraphQL search
for (const [key, value] of Object.entries(metadata)) {
tx.addTag(key, value)
}
await arweave.transactions.sign(tx, jwk)
const response = await arweave.transactions.post(tx)
return tx.id // permanent document ID
}
Query Arweave via ArQL / GraphQL to search documents by tags:
query FindDocuments($owner: String!, $docType: String!) {
transactions(
owners: [$owner]
tags: [
{ name: "App-Name", values: ["YourDApp"] }
{ name: "Document-Type", values: [$docType] }
]
first: 100
) {
edges {
node {
id
tags { name value }
block { timestamp }
}
}
}
}
Bundlr / Irys: batch upload and instant finality
Native Arweave transactions confirm in ~2 minutes, problematic for UX. Irys (formerly Bundlr) solves this via layer 2 over Arweave: transactions confirm instantly, batch, and post to Arweave.
import Irys from '@irys/sdk'
const irys = new Irys({
url: 'https://node1.irys.xyz',
token: 'ethereum',
key: privateKey,
})
// Check cost before upload
const price = await irys.getPrice(data.length)
console.log(`Cost: ${irys.utils.fromAtomic(price)} ETH`)
// Upload with tags
const receipt = await irys.upload(data, {
tags: [
{ name: 'Content-Type', value: 'application/json' },
{ name: 'Contract-Address', value: contractAddress },
]
})
// receipt.id — TXID, available immediately at https://gateway.irys.xyz/{id}
Hybrid architecture: hot + cold storage
Real systems rarely use just one protocol. Typical architecture for DApps with performance and permanence requirements:
Data Write:
User → App Backend → [in parallel]:
1. IPFS Cluster (hot, fast access, ~3 replicas)
2. Irys → Arweave (cold, permanent, 1-2 min)
3. PostgreSQL (CID + Arweave TXID + metadata, for search)
Data Read:
User → App Backend → PostgreSQL (lookup CID/TXID)
→ IPFS Gateway (fast if pinned)
→ Fallback: Arweave Gateway (if IPFS unavailable)
This provides instant reads via IPFS, permanence guarantees via Arweave, and search via regular database.
Integrity verification
Content addressing provides built-in verification for IPFS: CID is the hash of contents. For Arweave, verification via transaction proofs:
async function verifyArweaveData(txId: string, expectedHash: string): Promise<boolean> {
const tx = await arweave.transactions.get(txId)
const data = await arweave.transactions.getData(txId, { decode: true })
const hash = crypto.createHash('sha256').update(data as Buffer).digest('hex')
return hash === expectedHash
}
Encryption and access control
Decentralized storage doesn't mean public. For sensitive data — Lit Protocol for threshold encryption with on-chain access conditions:
import * as LitJsSdk from '@lit-protocol/lit-node-client'
// Access condition: NFT owner from collection
const accessControlConditions = [{
contractAddress: NFT_CONTRACT,
standardContractType: 'ERC721',
chain: 'ethereum',
method: 'balanceOf',
parameters: [':userAddress'],
returnValueTest: { comparator: '>', value: '0' }
}]
// Encrypt before uploading to IPFS
const { ciphertext, dataToEncryptHash } = await LitJsSdk.encryptString(
{ accessControlConditions, dataToEncrypt: sensitiveData },
litNodeClient
)
// Decrypt — only if on-chain condition is met
const decrypted = await LitJsSdk.decryptToString(
{ accessControlConditions, ciphertext, dataToEncryptHash, chain: 'ethereum' },
litNodeClient
)
Cost model and stack selection
| Criterion | IPFS + Cluster | Filecoin Deals | Arweave / Irys | Storj |
|---|---|---|---|---|
| Permanence | While pinned | 1–5 years per deal | Permanent | While paid |
| Read speed | High | Slow (retrieval) | Medium | High |
| Cost | Infrastructure | ~$0.0001/GB/month | ~$10/GB one-time | ~$4/TB/month |
| Censorship resistance | Medium | High | Very high | Medium |
| Search | CID only | No | GraphQL by tags | No |
For NFT projects and DAOs: Arweave via Irys is the only sensible choice — metadata costs are negligible, permanence is critical. For application data with high latency requirements: IPFS Cluster with hot replication + Filecoin for archive copies. For enterprise with compliance requirements: hybrid scheme with Lit Protocol encryption.







