Multi-blockchain data aggregation system

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
Multi-blockchain data aggregation system
Complex
from 2 weeks to 3 months
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1217
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1046
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823

Multi-blockchain Data Aggregation System Development

The task looks simple: "collect data from multiple blockchains". In practice, it's one of the most technically complex tasks in Web3 infrastructure. Each network is a separate data model, its own finalization logic, its own RPC API, its own rate limits, and its own specific errors. Ethereum operates in UTC with ~12-second blocks, Solana delivers ~400ms slots and counts confirmations differently, TON has a sharded architecture where "block" is a conditional concept. Collecting all this into a unified API with consistent data is a nontrivial engineering task.

The problem of heterogeneity: why you can't just "query all RPC"

Different data models

EVM networks (Ethereum, Arbitrum, Polygon, BSC) share a common model: blocks, transactions, receipts with logs. But there are differences even here:

  • Arbitrum adds l1BlockNumber and specific system transactions (sequencer batch submissions)
  • Optimism/Base have depositedTx type for L1→L2 transactions, which don't have standard from
  • zkSync Era uses Native AA — no distinction between EOA and contracts, all accounts are contracts

Solana is a completely different paradigm: no "transaction invoked contract method" — instead "instructions in transaction passed to programs". To decode you need an ABI analog — IDL (Interface Definition Language, Anchor format).

UTXO models (Bitcoin, Litecoin) are fundamentally different: no account balances, there are unspent outputs. "Address balance" is the sum of all UTXO where this address is output.

Different finalization semantics

Network Mechanism Finality
Ethereum PoS + Casper FFG ~15 min (finalized checkpoint)
Arbitrum One Optimistic Rollup ~7 days (fraud proof window) for L1 finality
Polygon PoS Heimdall checkpoints ~30 min for Ethereum finality
Solana Tower BFT ~12-32 slots (~6–16 sec)
Bitcoin PoW 6 confirmations (~60 min) — conventional standard

If the system doesn't account for this, data will be incorrect: a transaction will appear "final" by confirmation count but get reorganized.

Aggregation system architecture

Collector layer (Chain Collectors)

Each collector is an isolated service responsible for one network. Common interface:

interface ChainCollector {
  getLatestBlock(): Promise<UnifiedBlock>;
  getBlockRange(from: bigint, to: bigint): Promise<UnifiedBlock[]>;
  getTransactionsByAddress(address: string, fromBlock: bigint): Promise<UnifiedTx[]>;
  subscribeNewBlocks(callback: (block: UnifiedBlock) => void): Unsubscribe;
}

Unified types normalize each network's specifics:

interface UnifiedTx {
  chain: ChainId;
  hash: string;
  blockNumber: bigint;
  timestamp: number; // unix
  from: string;      // normalized lowercase hex for EVM, base58 for Solana
  to: string | null;
  value: bigint;     // in smallest units of native token
  status: 'success' | 'failed' | 'pending';
  finality: 'unconfirmed' | 'safe' | 'finalized';
  raw: unknown;      // original network data
}

Node and provider management

Problem: public RPC is unreliable, rate limits are unpredictable, Alchemy/Infura get expensive at scale.

Strategy: tiered provider pool

Primary: Own nodes (Geth+Lighthouse, Reth for archive)
  ↓ failover
Secondary: Alchemy / QuickNode (premium tier)
  ↓ failover  
Tertiary: Infura / public RPC (only for non-critical requests)

Circuit breaker on each provider: if error rate > 5% over 60 sec or latency > 2x p99 baseline — remove provider from rotation, health check every 30 sec.

For archive data (historical blocks > 128 blocks ago on Ethereum) you need an archive node — this is a separate story. Erigon takes ~3TB for full Ethereum archive, Reth slightly less. For most projects it's cheaper to use Alchemy Archive or QuickNode Archive than maintain own node.

Normalization and transformation layer

Raw blockchain data is rarely needed as-is. Typical transformations:

Decoding ERC-20 Transfer events

const ERC20_TRANSFER_TOPIC = 
  "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef";

function decodeTransfer(log: Log): TokenTransfer | null {
  if (log.topics[0] !== ERC20_TRANSFER_TOPIC) return null;
  return {
    token: log.address,
    from: `0x${log.topics[1].slice(26)}`,
    to: `0x${log.topics[2].slice(26)}`,
    amount: BigInt(log.data),
  };
}

Token data enrichment: for each log.address you need to know symbol, decimals, USD price. Cache token metadata in Redis with TTL 24h, update prices every 30 sec from CoinGecko/CoinMarketCap.

Cross-chain aggregation: if you need to show "total address balance across all networks in USD", you need to normalize different decimals, convert through price feeds, handle wrapped versions of same token (USDC on Ethereum ≠ USDC.e on Arbitrum).

Storage layer

For hot data (last 7–30 days): PostgreSQL with partitioning by chain_id + date. Indexes on (chain_id, address, block_number) and (chain_id, tx_hash). TimescaleDB hypertables for large data — automatic compression of old partitions.

For cold data (archive): ClickHouse — columnar database, order of magnitude more efficient than PostgreSQL for analytical queries over large periods. Query "all USDC transactions > $10k during 2023 across all EVM networks" on 100M+ rows — ClickHouse gives result in seconds, PostgreSQL in minutes.

For address/hash search: ElasticSearch or just PostgreSQL with LIKE — hash index sufficient for exact matches.

Reorg handling

This is the trickiest part of the system. Algorithm:

  1. Save each block with flag is_canonical = true and parent_hash
  2. New block with same block_number but different hash — potential reorg
  3. Walk back via parent_hash until common ancestor is found
  4. Mark all blocks on "old" branch as is_canonical = false, add blocks from "new" branch
  5. Output API always filters by is_canonical = true
  6. Webhooks/downstream systems receive tx.orphaned events for revoked transactions

For Ethereum reorg depth is rarely > 2 blocks post-Merge. For Polygon PoS — seen reorgs of 30+ blocks. Observation buffer: 128 blocks for EVM networks.

API layer

REST + WebSocket for real-time:

GET /v1/address/{address}/transactions?chains=eth,arb,polygon&limit=50
GET /v1/tx/{chain}/{hash}
GET /v1/address/{address}/token-balances?chains=eth,bsc
WS  /v1/subscribe?address={addr}&chains=eth,arb&events=transfer,swap

GraphQL is convenient if clients need flexibility in queries: one request gets transactions + balances + token metadata. But adds complexity on backend — N+1 problems, needs DataLoader.

Rate limiting: per-API-key, sliding window, separate limits for REST and WebSocket (WebSocket connections are more expensive). Redis + Lua script for atomic increments.

Monitoring and operations

Critical metrics:

  • Collector lag — difference between latest block timestamp in network and processing time in our system. Alert when lag > 2 min.
  • Reorg depth — max reorg depth over last 24h. Alert when depth > 10.
  • RPC error rate — per provider and method. Alert when > 1%.
  • Queue depth — if processor can't keep up with collector, queue grows. Alert when depth > 10k messages.

Grafana dashboard with per-chain panels: current block, lag, TPS, error rate.

Stack

Component Technology
Collectors Node.js (viem/ethers) + Go for high-load networks
Queue Apache Kafka (high throughput) or RabbitMQ (moderate)
Hot storage PostgreSQL 15 + TimescaleDB
Cold storage ClickHouse
Cache Redis Cluster
API Node.js (Fastify) or Go (Fiber)
Monitoring Prometheus + Grafana + PagerDuty
Orchestration Kubernetes with HPA on collectors

Realistic MVP timeline (3–4 EVM networks, no archive, REST API): 8–12 weeks. Complete system with 10+ networks, ClickHouse, WebSocket, monitoring — 5–7 months.