NFT Marketplace Data Scraping (OpenSea, Blur, Magic Eden)
OpenSea returns data via API, but 2 req/sec rate limit on free tier makes proper historical collection gathering an unpleasant quest. Blur and Magic Eden — each has its own model: Blur still largely built on undocumented endpoints, Magic Eden split into two incompatible APIs (Solana and EVM). If you need complete sales data, listings and floor price in real time — off-the-shelf solutions barely exist, need to build parser for each platform separately.
Data Collection Architecture
OpenSea: Official API vs. Events Endpoint
OpenSea Developer API v2 — only official path. Key methods:
-
GET /api/v2/events/collection/{slug}— event history (sales, listings, transfers) with pagination bynextcursor -
GET /api/v2/listings/collection/{slug}/all— active listings with prices -
GET /api/v2/collections/{slug}/stats— floor price, volume, supply
Problem: events endpoint returns max 50 events per request, rate limit not documented explicitly, but practically more than 2-3 req/sec leads to 429. For collecting history of 100k+ sales collection this takes weeks with dumb sequential requests.
Optimization: parallel workers with backoff, split event range by occurred_after / occurred_before parameters. With Pro API key limits grow to 10-20 req/sec — same history collecting takes hours.
For WebSocket updates: OpenSea Stream API based on Phoenix Channels. Subscribe to collection:{slug} gives realtime events of listings and sales without polling.
from opensea_stream import OpenSeaStreamClient, Network
client = OpenSeaStreamClient(token=API_KEY, network=Network.MAINNET)
client.on_item_sold("collection-slug", lambda event: handle_sale(event))
client.connect()
Blur: Working with Undocumented API
Blur doesn't provide public API. Data available via:
-
Blur GraphQL (
https://core-api.prod.blur.io/graphql) — not documented, but stable. Via DevTools browser easily extract requests to collections, listings and bids. - Reservoir Protocol — liquidity aggregator indexing Blur events on-chain. Provides single API for data from all major marketplaces including Blur.
Reservoir is most reliable path for Blur data. API well documented, has SDK:
import { createClient } from "@reservoir0x/reservoir-sdk"
const client = createClient({ apiBase: "https://api.reservoir.tools", apiKey: KEY })
const sales = await client.getSales({ collection: "0x...", limit: 100 })
Magic Eden: EVM and Solana Gap
Magic Eden has two incompatible APIs:
| API | Blockchains | Endpoint |
|---|---|---|
| Solana API v2 | Solana | api-mainnet.magiceden.dev/v2/ |
| Developer API | Ethereum, Polygon, Base | api-mainnet.magiceden.dev/v3/rtp/ |
Developer API (EVM) based on Reservoir under the hood — same data structures, same principles. Solana API v2 — different story: GET /collections/{symbol}/activities for events, pagination via offset.
Solana sales data stored in transaction logs. For full history independent of Magic Eden API — direct on-chain parsing via Helius or QuickNode with filtering by Magic Eden program ID (MEisE1HzehtrDpAAT8PnLHjpSSkRYakotTuJRPjTpo8).
Data Storage and Processing
Schema for sales data in PostgreSQL:
CREATE TABLE nft_sales (
id BIGSERIAL PRIMARY KEY,
blockchain VARCHAR(20) NOT NULL,
marketplace VARCHAR(20) NOT NULL,
contract_address VARCHAR(42),
token_id VARCHAR(78),
seller_address VARCHAR(42),
buyer_address VARCHAR(42),
price_raw NUMERIC(38,0),
price_usd DECIMAL(20,6),
currency_symbol VARCHAR(10),
transaction_hash VARCHAR(66) UNIQUE,
block_number BIGINT,
event_timestamp TIMESTAMPTZ NOT NULL,
raw_data JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX ON nft_sales (contract_address, event_timestamp DESC);
CREATE INDEX ON nft_sales (marketplace, event_timestamp DESC);
JSONB field raw_data preserves original API response — useful when marketplace data structure changes. Price conversion to USD: in real time via CoinGecko/Coingecko API or retrospectively via OHLCV data.
Anti-Detection for Web Scraping
If marketplace doesn't give API and need browser scraping (rare case, but happens for aggregators): Playwright + rotating residential proxies + stealth plugin (playwright-extra + puppeteer-extra-plugin-stealth). Without this headless Chrome detected by navigator.webdriver, canvas fingerprint and timing patterns.
For rate limit management — token bucket algorithm with Redis as distributed rate limiter if requests from multiple machines.
Process and Timeline
Day 1: analyze target marketplaces, obtain API keys, develop storage schema, implement basic HTTP clients with retry/backoff.
Day 2-3: workers for historical collection, WebSocket integration for realtime, data normalization across different marketplace formats, documentation and basic monitoring.
Result: system collecting complete history and keeping current data with 30-60 second delay relative to real blockchain event.







