On-Chain Data Scraping (Transactions, Balances, Contracts)

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
On-Chain Data Scraping (Transactions, Balances, Contracts)
Medium
~3-5 business days
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1217
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1046
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823

Parsing on-chain data (transactions, balances, contracts)

When they say "need blockchain parsing" — usually they mean one of three: historical data for analysis, real-time monitoring of specific addresses, or building indexed database for own product. Technical approach differs for each case. Common thing — need right data source and understanding that eth_getLogs is not "all data".

What you can get from blockchain

Block-level transactions (eth_getBlockByNumber with fullTx: true):

  • From/to/value/gas/gasPrice/nonce
  • Input data (calldata in hex)
  • Receipt: status (success/reverted), gasUsed, logs (events)

Internal transactions (calls between contracts) — not visible in regular transactions. Need debug_traceTransaction or trace_block (Erigon/OpenEthereum trace namespace). This is important: ETH transfer inside DeFi protocol doesn't create regular transaction — visible only in traces.

Events (logs) — emitted through emit Event(...) in Solidity. Available through eth_getLogs. Most efficient parsing method — filtering by address + topic at node level.

Storage state — contract storage variable values through eth_getStorageAt(address, slot, blockNumber). For archive node — any historical block. Need to know slot layout (from ABI + solc).

ERC-20 balances — through balanceOf(address) view call or through Transfer event history.

ENS / identity — reverse resolution through ENS registry contract.

Choosing data source

Source Provides Limitations
Public RPC (Infura/Alchemy) Standard JSON-RPC Rate limits, no traces
Self-hosted Geth Full JSON-RPC No traces without --gcmode=archive
Self-hosted Erigon JSON-RPC + trace namespace ~2.5 TB, 3-5 days sync
Alchemy/QuickNode (paid plans) Extended API + traces Cost at high RPS
Firehose (StreamingFast) Binary stream, all data Complex setup
Dune Analytics / Flipside SQL interface to indexed data Lag, schema limits

For most parsing tasks: Alchemy or QuickNode on paid plan — optimal start without infrastructure burden. For high volume or specific data (traces, storage) — self-hosted Erigon.

Transaction parsing

Basic block parser

import { createPublicClient, http, parseAbi } from 'viem';

const client = createPublicClient({
  transport: http(RPC_URL),
});

async function processBlock(blockNumber: bigint) {
  const block = await client.getBlock({
    blockNumber,
    includeTransactions: true,
  });

  for (const tx of block.transactions) {
    if (typeof tx === 'string') continue; // hash-only mode
    
    await db.insertTransaction({
      hash: tx.hash,
      blockNumber: Number(tx.blockNumber),
      blockTimestamp: Number(block.timestamp),
      from: tx.from,
      to: tx.to,
      value: tx.value.toString(),
      gasPrice: tx.gasPrice?.toString(),
      gasLimit: tx.gas.toString(),
      input: tx.input,
      nonce: tx.nonce,
    });
  }
}

Getting receipts

Receipt contains: execution status, actual gasUsed, logs (events). Batch request:

// Get receipts for entire block in one request (available on Alchemy/QuickNode)
const receipts = await client.request({
  method: 'eth_getBlockReceipts',
  params: [blockNumber],
});

// Or separately for each TX (standard JSON-RPC)
const receipt = await client.getTransactionReceipt({ hash: tx.hash });

eth_getBlockReceipts — non-standard method, available on Alchemy, QuickNode, Erigon. On standard Geth need N requests.

Event parsing (logs)

Most efficient parsing method for specific data:

// Parse all ERC-20 Transfer events on specific contract
const logs = await client.getLogs({
  address: TOKEN_ADDRESS,
  event: parseAbiItem('event Transfer(address indexed from, address indexed to, uint256 value)'),
  fromBlock: 19_000_000n,
  toBlock: 19_100_000n,
});

for (const log of logs) {
  await db.insertTransfer({
    txHash: log.transactionHash,
    blockNumber: Number(log.blockNumber),
    from: log.args.from,
    to: log.args.to,
    value: log.args.value.toString(),
  });
}

eth_getLogs limitations: range max 2000 blocks per request on most public nodes. For historical data need chunked polling:

async function fetchLogsInChunks(
  fromBlock: number,
  toBlock: number,
  chunkSize = 1000,
) {
  for (let from = fromBlock; from <= toBlock; from += chunkSize) {
    const to = Math.min(from + chunkSize - 1, toBlock);
    const logs = await client.getLogs({
      address: CONTRACT,
      fromBlock: BigInt(from),
      toBlock: BigInt(to),
    });
    await processLogs(logs);
    // Rate limiting: pause to not exhaust limits
    await sleep(100);
  }
}

Balances

Current balance

ERC-20 balance through view call:

const balance = await client.readContract({
  address: TOKEN_ADDRESS,
  abi: erc20Abi,
  functionName: 'balanceOf',
  args: [walletAddress],
});

Historical balance

Two approaches:

Computing from Transfer events — accumulate all Transfers to/from address and compute running balance. Accurate, but requires full event history.

eth_call on historical block — call balanceOf with blockTag: blockNumber on archive node. Direct method, but needs archive node.

const historicalBalance = await client.readContract({
  address: TOKEN_ADDRESS,
  abi: erc20Abi,
  functionName: 'balanceOf',
  args: [walletAddress],
  blockNumber: 18_500_000n,  // historical block
});

Multicall for batching

For getting balances of many addresses:

import { multicall } from 'viem';

const balances = await client.multicall({
  contracts: addresses.map(addr => ({
    address: TOKEN_ADDRESS,
    abi: erc20Abi,
    functionName: 'balanceOf',
    args: [addr],
  })),
  allowFailure: true,
});

One HTTP request instead of N — rate limit savings 10-100x.

Contract parsing

ABI retrieval through Etherscan API (or forks for other networks):

const abi = await fetch(
  `https://api.etherscan.io/api?module=contract&action=getabi&address=${address}&apikey=${ETHERSCAN_KEY}`
).then(r => r.json()).then(d => JSON.parse(d.result));

For unverified contracts — 4byte.directory for function signature decoding.

Bytecode analysiseth_getCode returns deployed bytecode. Can verify: is address contract (code not empty), is it proxy (EIP-1967 slot), compare bytecode hashes.

Storage layout — from solc --storage-layout get variable mapping → storage slots. Then eth_getStorageAt for reading values directly without ABI.

Multi-network parsing

Same code should work with different networks. Key differences:

Parameter Ethereum BNB Chain Polygon Arbitrum
Block time ~12 sec ~3 sec ~2 sec ~0.25 sec
Log chunk limit 2000 blocks 5000 3500 10000
Native decimals 18 18 18 18
Trace API Erigon/Besu Node with debug Limited Standard
const CHAIN_CONFIGS = {
  ethereum: { rpc: INFURA_ETH, chunkSize: 1000, blockTime: 12 },
  bsc:      { rpc: BSC_RPC,    chunkSize: 2000, blockTime: 3 },
  polygon:  { rpc: POLYGON_RPC, chunkSize: 1500, blockTime: 2 },
  arbitrum: { rpc: ARB_RPC,    chunkSize: 5000, blockTime: 0.25 },
};

Performance and storage

Data volume grows fast. For Ethereum:

  • ~6500 blocks/day × ~6000 TX/block = ~40M transactions/day
  • Each transaction with receipts: ~1-5 KB
  • Total: ~40-200 GB/day for full parsing

For most tasks not full parsing needed — only target contracts and events.

Storage: PostgreSQL for normalized data + TimescaleDB hypertables for time-series + S3 for raw JSON archive.

Indexes critical:

CREATE INDEX CONCURRENTLY ON transactions (from_address, block_number DESC);
CREATE INDEX CONCURRENTLY ON transactions (to_address, block_number DESC);
CREATE INDEX CONCURRENTLY ON transfers (token_address, block_number DESC);

Stack and timeline

Parser with transactions + events + balances support for 2-3 networks, PostgreSQL storage, basic REST API: 2-4 weeks depending on historical data depth and target contracts count.