NFT Marketplace Data Scraping (OpenSea, Blur, Magic Eden)

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
NFT Marketplace Data Scraping (OpenSea, Blur, Magic Eden)
Medium
~2-3 business days
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1214
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823

NFT Marketplace Data Scraping (OpenSea, Blur, Magic Eden)

OpenSea returns data via API, but 2 req/sec rate limit on free tier makes proper historical collection gathering an unpleasant quest. Blur and Magic Eden — each has its own model: Blur still largely built on undocumented endpoints, Magic Eden split into two incompatible APIs (Solana and EVM). If you need complete sales data, listings and floor price in real time — off-the-shelf solutions barely exist, need to build parser for each platform separately.

Data Collection Architecture

OpenSea: Official API vs. Events Endpoint

OpenSea Developer API v2 — only official path. Key methods:

  • GET /api/v2/events/collection/{slug} — event history (sales, listings, transfers) with pagination by next cursor
  • GET /api/v2/listings/collection/{slug}/all — active listings with prices
  • GET /api/v2/collections/{slug}/stats — floor price, volume, supply

Problem: events endpoint returns max 50 events per request, rate limit not documented explicitly, but practically more than 2-3 req/sec leads to 429. For collecting history of 100k+ sales collection this takes weeks with dumb sequential requests.

Optimization: parallel workers with backoff, split event range by occurred_after / occurred_before parameters. With Pro API key limits grow to 10-20 req/sec — same history collecting takes hours.

For WebSocket updates: OpenSea Stream API based on Phoenix Channels. Subscribe to collection:{slug} gives realtime events of listings and sales without polling.

from opensea_stream import OpenSeaStreamClient, Network

client = OpenSeaStreamClient(token=API_KEY, network=Network.MAINNET)
client.on_item_sold("collection-slug", lambda event: handle_sale(event))
client.connect()

Blur: Working with Undocumented API

Blur doesn't provide public API. Data available via:

  1. Blur GraphQL (https://core-api.prod.blur.io/graphql) — not documented, but stable. Via DevTools browser easily extract requests to collections, listings and bids.
  2. Reservoir Protocol — liquidity aggregator indexing Blur events on-chain. Provides single API for data from all major marketplaces including Blur.

Reservoir is most reliable path for Blur data. API well documented, has SDK:

import { createClient } from "@reservoir0x/reservoir-sdk"

const client = createClient({ apiBase: "https://api.reservoir.tools", apiKey: KEY })
const sales = await client.getSales({ collection: "0x...", limit: 100 })

Magic Eden: EVM and Solana Gap

Magic Eden has two incompatible APIs:

API Blockchains Endpoint
Solana API v2 Solana api-mainnet.magiceden.dev/v2/
Developer API Ethereum, Polygon, Base api-mainnet.magiceden.dev/v3/rtp/

Developer API (EVM) based on Reservoir under the hood — same data structures, same principles. Solana API v2 — different story: GET /collections/{symbol}/activities for events, pagination via offset.

Solana sales data stored in transaction logs. For full history independent of Magic Eden API — direct on-chain parsing via Helius or QuickNode with filtering by Magic Eden program ID (MEisE1HzehtrDpAAT8PnLHjpSSkRYakotTuJRPjTpo8).

Data Storage and Processing

Schema for sales data in PostgreSQL:

CREATE TABLE nft_sales (
    id BIGSERIAL PRIMARY KEY,
    blockchain VARCHAR(20) NOT NULL,
    marketplace VARCHAR(20) NOT NULL,
    contract_address VARCHAR(42),
    token_id VARCHAR(78),
    seller_address VARCHAR(42),
    buyer_address VARCHAR(42),
    price_raw NUMERIC(38,0),
    price_usd DECIMAL(20,6),
    currency_symbol VARCHAR(10),
    transaction_hash VARCHAR(66) UNIQUE,
    block_number BIGINT,
    event_timestamp TIMESTAMPTZ NOT NULL,
    raw_data JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX ON nft_sales (contract_address, event_timestamp DESC);
CREATE INDEX ON nft_sales (marketplace, event_timestamp DESC);

JSONB field raw_data preserves original API response — useful when marketplace data structure changes. Price conversion to USD: in real time via CoinGecko/Coingecko API or retrospectively via OHLCV data.

Anti-Detection for Web Scraping

If marketplace doesn't give API and need browser scraping (rare case, but happens for aggregators): Playwright + rotating residential proxies + stealth plugin (playwright-extra + puppeteer-extra-plugin-stealth). Without this headless Chrome detected by navigator.webdriver, canvas fingerprint and timing patterns.

For rate limit management — token bucket algorithm with Redis as distributed rate limiter if requests from multiple machines.

Process and Timeline

Day 1: analyze target marketplaces, obtain API keys, develop storage schema, implement basic HTTP clients with retry/backoff.

Day 2-3: workers for historical collection, WebSocket integration for realtime, data normalization across different marketplace formats, documentation and basic monitoring.

Result: system collecting complete history and keeping current data with 30-60 second delay relative to real blockchain event.