OHLCV Historical Candle Data Scraping from Exchanges

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1306 services

OHLCV Historical Candle Data Scraping from Exchanges

Medium

~2-3 days

Frequently Asked Questions

Blockchain Development Services

Discuss your blockchain project

Free consultation — we will show how blockchain can solve your challenge

Get a quote

We will estimate the budget and timeline for your blockchain project

Blockchain Development Stages

Latest works

B2B ADVANCE company website development
1309
Development of a web application for FEEDME
1222
Website development for BELFINGROUP
922
Development of an online store for the company FURNORO
1151
B2B Advance company logo design
614
Development of a web application for Enviok
887

Show more works

OHLCV Historical Candle Data Scraping

Historical OHLCV data is needed for backtesting strategies, building charts, calculating technical indicators, and training ML models. The problem: each exchange has different depth limits, rate limits, formats, and different data for the same instrument. Manual parsing via browser is for one-off tasks. Systematic collection requires infrastructure.

CEX vs DEX: different sources, different logic

Centralized exchanges (CEX): Binance, OKX, Bybit, Kraken — provide REST APIs for historical OHLCV. Data is clean, aggregated, with minimal gaps. Limits: rate limits (usually 1200 req/min per IP), max history depth (Binance gives 1000 candles per request, history from 2017).

Decentralized exchanges (DEX): Uniswap, Curve, dYdX — no native OHLCV API. Data must be computed from swap events on-chain or obtained via The Graph / DeFiLlama.

Binance parsing: proper implementation

Binance is the standard for historical data depth and API quality. Method /api/v3/klines:

interface BinanceKline {
    openTime: number      // ms
    open: string
    high: string
    low: string
    close: string
    volume: string
    closeTime: number
    quoteAssetVolume: string
    numberOfTrades: number
    takerBuyBaseVolume: string
    takerBuyQuoteVolume: string
}

async function fetchBinanceHistory(
    symbol: string,
    interval: string,   // '1m', '5m', '1h', '1d'
    startTime: number,  // ms
    endTime: number
): Promise<BinanceKline[]> {
    const allKlines: BinanceKline[] = []
    let currentStart = startTime

    while (currentStart < endTime) {
        const params = new URLSearchParams({
            symbol,
            interval,
            startTime: currentStart.toString(),
            endTime: endTime.toString(),
            limit: '1000',  // max per request
        })

        const response = await fetch(
            `https://api.binance.com/api/v3/klines?${params}`,
            { headers: { 'X-MBX-APIKEY': process.env.BINANCE_API_KEY! } }
        )

        if (response.status === 429) {
            // Rate limit: wait per Retry-After header
            const retryAfter = parseInt(response.headers.get('Retry-After') || '60')
            await sleep(retryAfter * 1000)
            continue
        }

        const klines: any[][] = await response.json()
        if (klines.length === 0) break

        const parsed = klines.map(k => ({
            openTime: k[0],
            open: k[1], high: k[2], low: k[3], close: k[4],
            volume: k[5],
            closeTime: k[6],
            quoteAssetVolume: k[7],
            numberOfTrades: k[8],
            takerBuyBaseVolume: k[9],
            takerBuyQuoteVolume: k[10],
        }))

        allKlines.push(...parsed)
        currentStart = klines[klines.length - 1][6] + 1  // last candle closeTime + 1ms

        // Sleep between requests
        await sleep(100)
    }

    return allKlines
}

Rate limiting: proper implementation

Naive sleep(100) is poor. Need token bucket or sliding window rate limiter, especially collecting multiple instruments in parallel:

class RateLimiter {
    private tokens: number
    private lastRefill: number

    constructor(
        private maxTokens: number,
        private refillRate: number,  // tokens per second
    ) {
        this.tokens = maxTokens
        this.lastRefill = Date.now()
    }

    async acquire(count = 1): Promise<void> {
        this.refill()

        if (this.tokens >= count) {
            this.tokens -= count
            return
        }

        // Wait until tokens accumulate
        const waitMs = ((count - this.tokens) / this.refillRate) * 1000
        await sleep(waitMs)
        this.tokens = 0
    }

    private refill(): void {
        const now = Date.now()
        const elapsed = (now - this.lastRefill) / 1000
        this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate)
        this.lastRefill = now
    }
}

// Binance: 1200 requests/minute = 20 requests/second
const binanceLimiter = new RateLimiter(1200, 20)

Multi-exchange collector

For strategies needing data from multiple exchanges simultaneously, ccxt library standardizes work with 100+ exchanges via unified interface:

import ccxt from 'ccxt'

class MultiExchangeOHLCVCollector {
    private exchanges: Map<string, ccxt.Exchange> = new Map()

    constructor(configs: ExchangeConfig[]) {
        for (const config of configs) {
            const ExchangeClass = ccxt[config.id as keyof typeof ccxt] as any
            this.exchanges.set(config.id, new ExchangeClass({
                apiKey: config.apiKey,
                secret: config.secret,
                rateLimit: true,  // ccxt built-in rate limiter
            }))
        }
    }

    async fetchOHLCV(
        exchangeId: string,
        symbol: string,
        timeframe: string,
        since?: number,
        limit = 500
    ): Promise<OHLCV[]> {
        const exchange = this.exchanges.get(exchangeId)!

        // ccxt returns: [timestamp, open, high, low, close, volume]
        const rawData = await exchange.fetchOHLCV(symbol, timeframe, since, limit)

        return rawData.map(([ts, o, h, l, c, v]) => ({
            exchange: exchangeId,
            symbol,
            timeframe,
            timestamp: new Date(ts),
            open: o, high: h, low: l, close: c, volume: v,
        }))
    }

    async fetchFullHistory(
        exchangeId: string,
        symbol: string,
        timeframe: string,
        fromDate: Date
    ): Promise<OHLCV[]> {
        const exchange = this.exchanges.get(exchangeId)!
        const all: OHLCV[] = []
        let since = fromDate.getTime()

        while (true) {
            const batch = await this.fetchOHLCV(exchangeId, symbol, timeframe, since)
            if (batch.length === 0) break

            all.push(...batch)
            since = batch[batch.length - 1].timestamp.getTime() + 1
            await exchange.sleep(exchange.rateLimit)
        }

        return all
    }
}

Data storage and processing

TimescaleDB is standard for OHLCV data. Hypertable by timestamp provides automatic partitioning and efficient range queries:

CREATE TABLE ohlcv (
    time        TIMESTAMPTZ NOT NULL,
    exchange    VARCHAR(50) NOT NULL,
    symbol      VARCHAR(20) NOT NULL,
    timeframe   VARCHAR(10) NOT NULL,
    open        NUMERIC(20, 8) NOT NULL,
    high        NUMERIC(20, 8) NOT NULL,
    low         NUMERIC(20, 8) NOT NULL,
    close       NUMERIC(20, 8) NOT NULL,
    volume      NUMERIC(30, 8) NOT NULL,
    PRIMARY KEY (time, exchange, symbol, timeframe)
);

SELECT create_hypertable('ohlcv', 'time');

-- Compress data older than 30 days
ALTER TABLE ohlcv SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'exchange, symbol, timeframe'
);
SELECT add_compression_policy('ohlcv', INTERVAL '30 days');

For timeframe conversion without re-parsing — Continuous Aggregates:

-- Auto aggregate 1m → 1h
CREATE MATERIALIZED VIEW ohlcv_1h
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS time,
    exchange, symbol,
    '1h' AS timeframe,
    first(open, time) AS open,
    max(high) AS high,
    min(low) AS low,
    last(close, time) AS close,
    sum(volume) AS volume
FROM ohlcv
WHERE timeframe = '1m'
GROUP BY 1, 2, 3;

DEX data via on-chain events

For DEX OHLCV (Uniswap V3 example) — compute from Swap events:

-- From indexed Uniswap V3 Swap events
-- sqrtPriceX96 → price: price = (sqrtPriceX96 / 2^96)^2 * 10^(decimal1 - decimal0)
SELECT
    time_bucket('1 hour', block_timestamp) AS time,
    'uniswap-v3' AS exchange,
    pair_symbol AS symbol,
    first(price, block_timestamp) AS open,
    max(price) AS high,
    min(price) AS low,
    last(price, block_timestamp) AS close,
    sum(volume_usd) AS volume
FROM uniswap_swaps
GROUP BY 1, 2, 3

Full collector with multi-exchange support, TimescaleDB storage, incremental updates, and gap alerts — 1-2 weeks development.