Blockchain Explorer Data Scraping

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
Blockchain Explorer Data Scraping
Medium
~2-3 business days
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1217
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1046
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823

Parsing Data from Blockchain Explorers

Etherscan, BscScan, Polygonscan — convenient interfaces on top of node, but their API has strict limits: 5 requests per second on free plan, no streaming, pagination limited to 10000 records. If need to download history of 500K transactions for specific contract or track all interactions with address — need know workarounds and alternatives.

Etherscan API: What it Can and Can't

What works well:

  • Address transaction history: ?module=account&action=txlist&address=0x...
  • ERC-20 transfers: ?module=account&action=tokentx&address=0x...
  • Contract verification and ABI retrieval: ?module=contract&action=getabi
  • Contract source code: ?module=contract&action=getsourcecode

Hard limits:

  • Maximum 10000 records per request (bypass via block pagination)
  • startblock/endblock parameters — only pagination method
  • Rate limit: 5 req/sec (free), 10-20/sec (paid plans)
  • No WebSocket/streaming
  • Internal transactions (action=txlistinternal) not always complete
import httpx
import asyncio
from typing import AsyncGenerator

async def get_all_transactions(
    address: str, 
    api_key: str,
    start_block: int = 0
) -> AsyncGenerator[dict, None]:
    """Download ALL address transactions via block pagination"""
    
    base_url = "https://api.etherscan.io/api"
    current_block = start_block
    
    while True:
        async with httpx.AsyncClient() as client:
            resp = await client.get(base_url, params={
                "module": "account",
                "action": "txlist",
                "address": address,
                "startblock": current_block,
                "endblock": 99999999,
                "sort": "asc",
                "apikey": api_key,
                "offset": 10000,
                "page": 1,
            })
        
        data = resp.json()
        if data["status"] != "1" or not data["result"]:
            break
            
        txs = data["result"]
        for tx in txs:
            yield tx
        
        if len(txs) < 10000:
            break  # last page
        
        # Next block = last received + 1
        current_block = int(txs[-1]["blockNumber"]) + 1
        await asyncio.sleep(0.2)  # rate limit 5 req/sec

Important nuance: if single block has > 10000 transactions for address (theoretically possible for contracts like USDT) — loop will hang. Solution: additional pagination inside block via page parameter.

Etherscan API Alternatives

Alchemy / Infura / QuickNode Enhanced APIs. Allow requests like "all transactions for address X" without 10000 limit:

import { Alchemy, Network } from 'alchemy-sdk';

const alchemy = new Alchemy({ apiKey: process.env.ALCHEMY_KEY, network: Network.ETH_MAINNET });

// Get all Asset Transfers (ERC-20, ERC-721, ETH)
const transfers = await alchemy.core.getAssetTransfers({
  fromAddress: '0x...',
  category: ['external', 'erc20', 'erc721', 'erc1155'],
  withMetadata: true,
  maxCount: 1000,
});

// Continue via pageKey
if (transfers.pageKey) {
  const more = await alchemy.core.getAssetTransfers({
    pageKey: transfers.pageKey,
    // ... same parameters
  });
}

Alchemy Asset Transfers API much more powerful than Etherscan: no 10K limit, pagination via pageKey, returns ETH + all tokens in one request.

Moralis Web3 API:

import Moralis from 'moralis';

await Moralis.start({ apiKey: process.env.MORALIS_KEY });

const response = await Moralis.EvmApi.transaction.getWalletTransactions({
  chain: '0x1',
  address: '0x...',
  limit: 100,
  cursor: undefined, // for pagination
});

const { result, cursor } = response.toJSON();

Moralis also does getWalletTokenTransfers, getNFTTransfers, cross-chain queries.

Parsing Explorer HTML (When API Isn't Enough)

Sometimes data exists only in web interface, not API: token holders list on Etherscan, verified contract list, some internal calls. Then — scraping HTML.

import httpx
from bs4 import BeautifulSoup
import asyncio

async def get_token_holders(token_address: str, pages: int = 10) -> list[dict]:
    """Parse top token holders from Etherscan"""
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept": "text/html",
    }
    holders = []
    
    async with httpx.AsyncClient(headers=headers) as client:
        for page in range(1, pages + 1):
            resp = await client.get(
                f"https://etherscan.io/token/{token_address}",
                params={"a": "#holders", "p": page}
            )
            
            soup = BeautifulSoup(resp.text, 'html.parser')
            table = soup.find('table', {'id': 'transfersTable'})
            if not table:
                break
                
            for row in table.find_all('tr')[1:]:  # skip header
                cols = row.find_all('td')
                if len(cols) >= 3:
                    holders.append({
                        'rank': cols[0].text.strip(),
                        'address': cols[1].find('a')['href'].split('/')[-1],
                        'quantity': cols[2].text.strip(),
                        'percentage': cols[3].text.strip() if len(cols) > 3 else None,
                    })
            
            await asyncio.sleep(2)  # respect server
    
    return holders

Etherscan has bot protection (Cloudflare), aggressive scraping can lead to temporary IP ban. Better use residential proxy or official API.

Direct Node Work

For maximum data completeness — own node + Erigon (--tracing for internal transactions) or own Etherscan-like indexer. More expensive but gives:

  • Internal transactions without limits
  • Data by storage slots
  • Trace calls for MEV analysis
# eth_getBlockReceipts — all receipts of block in one request
curl -X POST $ETH_RPC_URL \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_getBlockReceipts","params":["0x1234567"],"id":1}'

eth_getBlockReceipts (EIP-1559 extension, supported by Alchemy/Infura) returns all block receipts in one request — significantly more efficient than N separate eth_getTransactionReceipt.

Storage and Deduplication

Parallel collection from multiple sources creates duplicates. Strategy: INSERT ... ON CONFLICT (tx_hash) DO NOTHING for transactions, (tx_hash, log_index) unique constraint for events.

CREATE TABLE eth_transactions (
    tx_hash      CHAR(66) PRIMARY KEY,
    block_number BIGINT NOT NULL,
    from_address CHAR(42) NOT NULL,
    to_address   CHAR(42),
    value        NUMERIC(38) DEFAULT 0,
    gas_used     BIGINT,
    status       SMALLINT,
    ts           TIMESTAMPTZ
);

-- Safe upsert without errors on duplicates
INSERT INTO eth_transactions VALUES (...)
ON CONFLICT (tx_hash) DO NOTHING;