Open Interest Data Scraping from Exchanges
Open Interest (OI) — total volume of open futures/options positions. Key indicator for derivatives analytics: sudden OI rise with price fall means new shorts opened, OI rise with price rise — long accumulation. But for this you need reliable data, and each exchange has its own API with its quirks.
Data sources: Where to get OI
Centralized derivative exchanges publish OI via REST and WebSocket API:
| Exchange | Endpoint | Feature |
|---|---|---|
| Binance | GET /fapi/v1/openInterest (perp), /futures/data/openInterestHist (history) |
History only 30 days, 5min/15min/1h granularity |
| Bybit | GET /v5/market/open-interest |
intervalTime param: 5min, 15min, 30min, 1h, 4h, 1d |
| OKX | GET /api/v5/rubric/open-interest |
Futures, swap, options support |
| dYdX v4 | GraphQL API or Indexer REST | On-chain, public data without keys |
| GMX v2 | On-chain via Reader contract |
No centralized API |
For aggregate market picture need all major platforms. Single exchange dominance skews metrics: in 2022 FTX gave ~30% of market OI, after collapse all indices dropped.
Collector Architecture
Key decision: polling vs WebSocket. Most exchanges provide OI only via REST (not WebSocket — OI isn't high-frequency like price). Optimal approach — scheduled polling every 1-5 minutes.
import asyncio
import aiohttp
from datetime import datetime
from decimal import Decimal
class OICollector:
def __init__(self, db, symbols: list[str]):
self.db = db
self.symbols = symbols
self.session: aiohttp.ClientSession = None
async def collect_binance_oi(self, symbol: str) -> dict:
url = f"https://fapi.binance.com/fapi/v1/openInterest"
async with self.session.get(url, params={"symbol": symbol}) as resp:
data = await resp.json()
return {
"exchange": "binance",
"symbol": symbol,
"oi_value": Decimal(data["openInterest"]),
"oi_usd": Decimal(data["openInterest"]) * await self.get_price(symbol),
"timestamp": datetime.utcfromtimestamp(data["time"] / 1000),
}
async def collect_bybit_oi(self, symbol: str) -> dict:
url = "https://api.bybit.com/v5/market/open-interest"
async with self.session.get(url, params={
"category": "linear",
"symbol": symbol,
"intervalTime": "5min",
"limit": 1,
}) as resp:
data = await resp.json()
item = data["result"]["list"][0]
return {
"exchange": "bybit",
"symbol": symbol,
"oi_value": Decimal(item["openInterest"]),
"timestamp": datetime.utcfromtimestamp(int(item["timestamp"]) / 1000),
}
async def collect_all(self):
tasks = []
for symbol in self.symbols:
tasks.extend([
self.collect_binance_oi(symbol),
self.collect_bybit_oi(symbol),
])
results = await asyncio.gather(*tasks, return_exceptions=True)
valid = [r for r in results if not isinstance(r, Exception)]
await self.db.bulk_insert(valid)
Rate limits and bypass strategies
Each exchange has rate limits. Binance fapi: 2400 weight/minute, openInterest = 1 weight. Bybit: 600 req/5 sec. With 50+ pairs, polling every minute easily hits limits.
Strategies:
Symbol prioritization. BTC and ETH — every minute, top-20 by volume — every 5 minutes, rest — every 15-30 minutes.
Parallel collection with rate limit guard:
from asyncio import Semaphore
class RateLimitedCollector:
def __init__(self, max_concurrent: int = 10):
self.semaphore = Semaphore(max_concurrent)
self.last_request_times = {} # exchange -> deque of timestamps
async def throttled_request(self, exchange: str, coro):
async with self.semaphore:
await self.enforce_rate_limit(exchange)
return await coro
IP rotation — if data volume requires > 1 collector instance, each runs with separate IP. Use residential proxies or different VPS per exchange.
Exchange WebSocket for prices — get price from WebSocket (high frequency), OI from REST on schedule. Avoid extra REST calls for prices.
Data normalization
Different exchanges return OI in different units:
- Binance BTCUSDT perp — in BTC (number of contracts × 1 BTC per contract)
- Bybit BTCUSDT — in USD (base currency × price)
- OKX BTC-USDT-SWAP — in contracts (1 contract = 0.01 BTC)
- CME Bitcoin Futures — in contracts (1 contract = 5 BTC)
For comparable aggregate, convert all to USD:
def normalize_to_usd(oi_value: Decimal, unit: str, btc_price: Decimal) -> Decimal:
match unit:
case "BTC":
return oi_value * btc_price
case "USD" | "USDT":
return oi_value
case "contracts_0.01BTC":
return oi_value * Decimal("0.01") * btc_price
case "contracts_5BTC": # CME
return oi_value * Decimal("5") * btc_price
case _:
raise ValueError(f"Unknown OI unit: {unit}")
Storage and aggregation
TimescaleDB — optimal for time-series OI data:
CREATE TABLE open_interest (
time TIMESTAMPTZ NOT NULL,
exchange TEXT NOT NULL,
symbol TEXT NOT NULL,
oi_contracts NUMERIC(30, 8),
oi_usd NUMERIC(30, 2),
PRIMARY KEY (time, exchange, symbol)
);
SELECT create_hypertable('open_interest', 'time');
-- Continuous aggregate: aggregated OI across all exchanges
CREATE MATERIALIZED VIEW oi_aggregate_5m
WITH (timescaledb.continuous) AS
SELECT
time_bucket('5 minutes', time) AS bucket,
symbol,
SUM(oi_usd) AS total_oi_usd,
jsonb_object_agg(exchange, oi_usd) AS by_exchange
FROM open_interest
GROUP BY bucket, symbol;
OI change alerts: Trading signals
Sharp OI change — trading signal. Standard thresholds:
- OI up > 5% in 1 hour — significant position opening
- OI down > 10% in 1 hour — liquidations or mass closing
-- Current OI change last 1 and 4 hours
SELECT
symbol,
total_oi_usd AS current_oi,
LAG(total_oi_usd, 12) OVER (PARTITION BY symbol ORDER BY bucket) AS oi_1h_ago,
(total_oi_usd - LAG(total_oi_usd, 12) OVER (PARTITION BY symbol ORDER BY bucket))
/ LAG(total_oi_usd, 12) OVER (PARTITION BY symbol ORDER BY bucket) * 100 AS change_1h_pct
FROM oi_aggregate_5m
WHERE bucket = (SELECT MAX(bucket) FROM oi_aggregate_5m)
ORDER BY ABS(change_1h_pct) DESC NULLS LAST;
Additional metrics
OI combined with other data gives fuller picture:
Long/Short ratio — available on Binance (/futures/data/globalLongShortAccountRatio), Bybit. Shows trader positioning.
Funding rate — cost of holding perpetual position. High positive funding + high OI = overheated longs.
OI-weighted funding — average funding across exchanges weighted by their OI. More accurate aggregate than simple average.
Development of collector for 5-7 exchanges with normalization and basic aggregates — 1-2 weeks. Full analytics pipeline with alerts, API and dashboard — 3-4 weeks.







