NFT Collection Data Scraping (floor price, volume, holders)
OpenSea API returns floor price with 5-15 minute delay and aggregates data by their methodology. For trading bots, analytics platforms and minting dApps that need real floor — this is unacceptable. Only path to accurate data: read events directly from blockchain.
Data Sources: Where to Get What
On-chain events
For ERC-721/ERC-1155 collections all sales visible via marketplace events. Each marketplace emits its own event:
-
OpenSea Seaport:
OrderFulfilled(bytes32 orderHash, address offerer, address zone, address recipient, SpentItem[] offer, ReceivedItem[] consideration)— contract0x00000000000000ADc04C56Bf30aC9d3c0aAF14dC -
Blur:
TakerAsk/TakerBidon0x000000000000Ad05Ccc4F10045630fb830B95127 -
LooksRare v2:
TakerAsk/TakerBid -
X2Y2:
EvInventory
Floor price can't be obtained from events directly — events show executed orders, not active listings. For current floor you need either index active listings via marketplace API or use aggregators.
Holders and Transfers
Transfer(address indexed from, address indexed to, uint256 indexed tokenId) — ERC-721 standard. Complete ownership graph built via replay of all Transfer events from deployment block. Unique holders = unique to addresses minus addresses that later transferred tokens elsewhere.
For ERC-1155: TransferSingle and TransferBatch. Here ownership is balance, not binary state: balanceOf(address, tokenId).
Parser Architecture
Stack
ethereum-node (Alchemy/Infura/Quicknode)
→ ethers.js / viem (event filtering)
→ message queue (Redis Streams / BullMQ)
→ PostgreSQL / ClickHouse (storage)
→ REST/WebSocket API (data delivery)
For historical data — getLogs with filter by address and topics[0]. Batch blocks by 2000 (limit on most RPC providers for eth_getLogs):
async function fetchTransferEvents(
contract: string,
fromBlock: number,
toBlock: number,
provider: JsonRpcProvider
) {
const iface = new Interface(['event Transfer(address indexed from, address indexed to, uint256 indexed tokenId)']);
const filter = {
address: contract,
topics: [iface.getEventTopic('Transfer')],
fromBlock,
toBlock,
};
const logs = await provider.getLogs(filter);
return logs.map(log => iface.parseLog(log));
}
For real-time: WebSocket subscription via provider.on(filter, callback) or Alchemy eth_subscribe newLogs.
Computing Floor Price
Two approaches:
1. Marketplace API aggregation — request floor from OpenSea, Blur, LooksRare, take minimum. Problem: rate limits and caching on API side.
2. Orderbook indexing — subscribe to order creation/cancellation events. Seaport: OrderValidated (creation), OrderCancelled, OrderFulfilled (execution). Build local orderbook, compute floor yourself. More accurate, but harder to maintain with contract updates.
For most tasks first approach with 60 second cache enough.
Storage and Queries
ClickHouse more efficient than PostgreSQL for time-series NFT data — analytics queries on millions of rows 10–50x faster. Schema:
| Column | Type | Description |
|---|---|---|
block_number |
UInt64 | Event block |
tx_hash |
FixedString(66) | Transaction hash |
contract |
FixedString(42) | Collection address |
token_id |
UInt256 | Token ID |
from |
FixedString(42) | Seller/sender |
to |
FixedString(42) | Buyer/recipient |
price_wei |
UInt256 | Price in wei |
marketplace |
LowCardinality(String) | Marketplace |
timestamp |
DateTime | Block time |
Partition by months (toYYYYMM(timestamp)), sort key (contract, timestamp).
Solving Typical Problems
Rate limits: Alchemy Free — 330 CUPS, Growth — 660 CUPS. On historical parsing of large collection (BAYC: 500k+ Transfer events) without throttling you get 429. Implement exponential backoff + queue with concurrency control.
Blockchain reorganizations: events from last 12 blocks should be marked as "pending" and confirmed only after finality. For Ethereum PoS — 2 epochs (64 blocks) for economic finality.
Wash trading: volume by addresses with circular transfers distorts stats. Basic heuristic: trades where from and to are related addresses (got ETH from one source) marked with flag.
Timeline Estimates
Transfer events parser + holders tracker — 1 day. Adding floor price via marketplace API + cache — another half day. Historical backfill for large collection + dashboard — 2-3 days total.







