Parsing on-chain data (transactions, balances, contracts)
When they say "need blockchain parsing" — usually they mean one of three: historical data for analysis, real-time monitoring of specific addresses, or building indexed database for own product. Technical approach differs for each case. Common thing — need right data source and understanding that eth_getLogs is not "all data".
What you can get from blockchain
Block-level transactions (eth_getBlockByNumber with fullTx: true):
- From/to/value/gas/gasPrice/nonce
- Input data (calldata in hex)
- Receipt: status (success/reverted), gasUsed, logs (events)
Internal transactions (calls between contracts) — not visible in regular transactions. Need debug_traceTransaction or trace_block (Erigon/OpenEthereum trace namespace). This is important: ETH transfer inside DeFi protocol doesn't create regular transaction — visible only in traces.
Events (logs) — emitted through emit Event(...) in Solidity. Available through eth_getLogs. Most efficient parsing method — filtering by address + topic at node level.
Storage state — contract storage variable values through eth_getStorageAt(address, slot, blockNumber). For archive node — any historical block. Need to know slot layout (from ABI + solc).
ERC-20 balances — through balanceOf(address) view call or through Transfer event history.
ENS / identity — reverse resolution through ENS registry contract.
Choosing data source
| Source | Provides | Limitations |
|---|---|---|
| Public RPC (Infura/Alchemy) | Standard JSON-RPC | Rate limits, no traces |
| Self-hosted Geth | Full JSON-RPC | No traces without --gcmode=archive |
| Self-hosted Erigon | JSON-RPC + trace namespace | ~2.5 TB, 3-5 days sync |
| Alchemy/QuickNode (paid plans) | Extended API + traces | Cost at high RPS |
| Firehose (StreamingFast) | Binary stream, all data | Complex setup |
| Dune Analytics / Flipside | SQL interface to indexed data | Lag, schema limits |
For most parsing tasks: Alchemy or QuickNode on paid plan — optimal start without infrastructure burden. For high volume or specific data (traces, storage) — self-hosted Erigon.
Transaction parsing
Basic block parser
import { createPublicClient, http, parseAbi } from 'viem';
const client = createPublicClient({
transport: http(RPC_URL),
});
async function processBlock(blockNumber: bigint) {
const block = await client.getBlock({
blockNumber,
includeTransactions: true,
});
for (const tx of block.transactions) {
if (typeof tx === 'string') continue; // hash-only mode
await db.insertTransaction({
hash: tx.hash,
blockNumber: Number(tx.blockNumber),
blockTimestamp: Number(block.timestamp),
from: tx.from,
to: tx.to,
value: tx.value.toString(),
gasPrice: tx.gasPrice?.toString(),
gasLimit: tx.gas.toString(),
input: tx.input,
nonce: tx.nonce,
});
}
}
Getting receipts
Receipt contains: execution status, actual gasUsed, logs (events). Batch request:
// Get receipts for entire block in one request (available on Alchemy/QuickNode)
const receipts = await client.request({
method: 'eth_getBlockReceipts',
params: [blockNumber],
});
// Or separately for each TX (standard JSON-RPC)
const receipt = await client.getTransactionReceipt({ hash: tx.hash });
eth_getBlockReceipts — non-standard method, available on Alchemy, QuickNode, Erigon. On standard Geth need N requests.
Event parsing (logs)
Most efficient parsing method for specific data:
// Parse all ERC-20 Transfer events on specific contract
const logs = await client.getLogs({
address: TOKEN_ADDRESS,
event: parseAbiItem('event Transfer(address indexed from, address indexed to, uint256 value)'),
fromBlock: 19_000_000n,
toBlock: 19_100_000n,
});
for (const log of logs) {
await db.insertTransfer({
txHash: log.transactionHash,
blockNumber: Number(log.blockNumber),
from: log.args.from,
to: log.args.to,
value: log.args.value.toString(),
});
}
eth_getLogs limitations: range max 2000 blocks per request on most public nodes. For historical data need chunked polling:
async function fetchLogsInChunks(
fromBlock: number,
toBlock: number,
chunkSize = 1000,
) {
for (let from = fromBlock; from <= toBlock; from += chunkSize) {
const to = Math.min(from + chunkSize - 1, toBlock);
const logs = await client.getLogs({
address: CONTRACT,
fromBlock: BigInt(from),
toBlock: BigInt(to),
});
await processLogs(logs);
// Rate limiting: pause to not exhaust limits
await sleep(100);
}
}
Balances
Current balance
ERC-20 balance through view call:
const balance = await client.readContract({
address: TOKEN_ADDRESS,
abi: erc20Abi,
functionName: 'balanceOf',
args: [walletAddress],
});
Historical balance
Two approaches:
Computing from Transfer events — accumulate all Transfers to/from address and compute running balance. Accurate, but requires full event history.
eth_call on historical block — call balanceOf with blockTag: blockNumber on archive node. Direct method, but needs archive node.
const historicalBalance = await client.readContract({
address: TOKEN_ADDRESS,
abi: erc20Abi,
functionName: 'balanceOf',
args: [walletAddress],
blockNumber: 18_500_000n, // historical block
});
Multicall for batching
For getting balances of many addresses:
import { multicall } from 'viem';
const balances = await client.multicall({
contracts: addresses.map(addr => ({
address: TOKEN_ADDRESS,
abi: erc20Abi,
functionName: 'balanceOf',
args: [addr],
})),
allowFailure: true,
});
One HTTP request instead of N — rate limit savings 10-100x.
Contract parsing
ABI retrieval through Etherscan API (or forks for other networks):
const abi = await fetch(
`https://api.etherscan.io/api?module=contract&action=getabi&address=${address}&apikey=${ETHERSCAN_KEY}`
).then(r => r.json()).then(d => JSON.parse(d.result));
For unverified contracts — 4byte.directory for function signature decoding.
Bytecode analysis — eth_getCode returns deployed bytecode. Can verify: is address contract (code not empty), is it proxy (EIP-1967 slot), compare bytecode hashes.
Storage layout — from solc --storage-layout get variable mapping → storage slots. Then eth_getStorageAt for reading values directly without ABI.
Multi-network parsing
Same code should work with different networks. Key differences:
| Parameter | Ethereum | BNB Chain | Polygon | Arbitrum |
|---|---|---|---|---|
| Block time | ~12 sec | ~3 sec | ~2 sec | ~0.25 sec |
| Log chunk limit | 2000 blocks | 5000 | 3500 | 10000 |
| Native decimals | 18 | 18 | 18 | 18 |
| Trace API | Erigon/Besu | Node with debug | Limited | Standard |
const CHAIN_CONFIGS = {
ethereum: { rpc: INFURA_ETH, chunkSize: 1000, blockTime: 12 },
bsc: { rpc: BSC_RPC, chunkSize: 2000, blockTime: 3 },
polygon: { rpc: POLYGON_RPC, chunkSize: 1500, blockTime: 2 },
arbitrum: { rpc: ARB_RPC, chunkSize: 5000, blockTime: 0.25 },
};
Performance and storage
Data volume grows fast. For Ethereum:
- ~6500 blocks/day × ~6000 TX/block = ~40M transactions/day
- Each transaction with receipts: ~1-5 KB
- Total: ~40-200 GB/day for full parsing
For most tasks not full parsing needed — only target contracts and events.
Storage: PostgreSQL for normalized data + TimescaleDB hypertables for time-series + S3 for raw JSON archive.
Indexes critical:
CREATE INDEX CONCURRENTLY ON transactions (from_address, block_number DESC);
CREATE INDEX CONCURRENTLY ON transactions (to_address, block_number DESC);
CREATE INDEX CONCURRENTLY ON transfers (token_address, block_number DESC);
Stack and timeline
Parser with transactions + events + balances support for 2-3 networks, PostgreSQL storage, basic REST API: 2-4 weeks depending on historical data depth and target contracts count.







