Development of API for Access to Collected Data
Collecting data is half the task. The second half is making it accessible in a format that doesn't slow down clients, doesn't collapse under load, and doesn't tell everyone what you've been scraping for the last six months. API for blockchain/crypto data has specifics: data is immutable and append-only (historical records don't change), volumes are large (millions of transactions), queries are often time-series by nature, clients want both REST for simple queries and WebSocket for real-time.
REST API design
Versioning and URL structure
/v1/funding-rates/{symbol}?exchange=binance&from=2024-01-01&to=2024-03-01&limit=500
/v1/transactions/{chain}/{address}?from_block=19000000&limit=100
/v1/news?tags=bitcoin,regulation&from=2024-01-15T00:00:00Z&limit=50
/v1/gas/history?network=ethereum&granularity=1h&from=2024-01-01
Principles:
-
limit+cursorpagination instead ofpage+offset— stable when new data is inserted - Time parameters in ISO 8601 (including timezone) or unix milliseconds — accept both
-
?fields=for field selection — don't return 30 fields if client uses 3
Cursor-based pagination
interface PaginatedResponse<T> {
data: T[];
pagination: {
cursor: string | null; // null = last page
hasMore: boolean;
total?: number; // optional, expensive to calculate
};
}
// Cursor = base64(JSON({lastId, lastTimestamp}))
// Next page request: GET /v1/funding-rates?cursor={opaque_string}
Adding new records at the beginning doesn't break client navigation through history.
Query parameters and validation
Zod for input validation — describes schema and converts types:
import { z } from "zod";
const FundingRatesQuerySchema = z.object({
symbol: z.string().regex(/^[A-Z]+-[A-Z]+$/, "Invalid symbol format"),
exchange: z.enum(["binance", "bybit", "okx", "hyperliquid"]).optional(),
from: z.coerce.date(),
to: z.coerce.date(),
limit: z.coerce.number().min(1).max(1000).default(100),
cursor: z.string().optional(),
});
type FundingRatesQuery = z.infer<typeof FundingRatesQuerySchema>;
Early 400 return with detailed errors saves client time:
{
"error": "VALIDATION_ERROR",
"details": [
{ "field": "from", "message": "Invalid date format" },
{ "field": "limit", "message": "Must be between 1 and 1000" }
]
}
Performance: caching and query optimization
Multi-layer cache
Client → CDN (static historical data, TTL 1h)
→ Application cache (Redis, TTL 30s–5m)
→ Read replica PostgreSQL / ClickHouse
Redis caching by query key:
async function getFundingRates(query: FundingRatesQuery): Promise<FundingRateRecord[]> {
const cacheKey = `fr:${query.symbol}:${query.exchange ?? "all"}:${query.from.getTime()}:${query.to.getTime()}`;
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
const data = await db.queryFundingRates(query);
// Historical data (past) cache long, real-time short
const ttl = query.to < new Date(Date.now() - 3600_000) ? 3600 : 30;
await redis.setEx(cacheKey, ttl, JSON.stringify(data));
return data;
}
Database query optimization
For time-series data in PostgreSQL — indexes are critical:
-- Covering index for typical query
CREATE INDEX CONCURRENTLY idx_funding_rates_lookup
ON funding_rates (symbol, exchange, settled_at DESC)
INCLUDE (funding_rate, mark_price);
-- Query should use this index:
EXPLAIN ANALYZE
SELECT settled_at, funding_rate, mark_price
FROM funding_rates
WHERE symbol = 'BTC-USDT'
AND exchange = 'binance'
AND settled_at BETWEEN $1 AND $2
ORDER BY settled_at DESC
LIMIT 100;
For analytical queries (aggregations, averages over periods) — ClickHouse is significantly faster than PostgreSQL.
WebSocket for real-time data
// Fastify + @fastify/websocket
fastify.get("/v1/stream", { websocket: true }, (socket, req) => {
const subscriptions = parseSubscriptions(req.query);
const unsubscribers = subscriptions.map((sub) =>
eventBus.on(sub.channel, (data) => {
if (socket.readyState === WebSocket.OPEN) {
socket.send(JSON.stringify({ channel: sub.channel, data }));
}
})
);
socket.on("message", (msg) => {
const cmd = JSON.parse(msg.toString());
if (cmd.type === "subscribe") { /* ... */ }
if (cmd.type === "unsubscribe") { /* ... */ }
if (cmd.type === "ping") socket.send(JSON.stringify({ type: "pong" }));
});
socket.on("close", () => unsubscribers.forEach(unsub => unsub()));
});
Heartbeat: server sends ping every 30 seconds, client should reply pong. No response in 10 sec — close connection. This cuts off hung connections consuming memory.
Rate limiting and auth
API keys instead of JWT for server-to-server: don't expire, easy to invalidate, no refresh flow:
// Middleware
async function apiKeyAuth(req: FastifyRequest, reply: FastifyReply) {
const key = req.headers["x-api-key"];
if (!key) return reply.code(401).send({ error: "API key required" });
const apiKey = await redis.hGetAll(`apikey:${key}`);
if (!apiKey.id) return reply.code(401).send({ error: "Invalid API key" });
req.apiKeyId = apiKey.id;
req.rateLimitTier = apiKey.tier; // "free", "standard", "premium"
}
Sliding window rate limiting via Redis Lua script — atomic operation:
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
redis.call("ZREMRANGEBYSCORE", key, 0, now - window)
local count = redis.call("ZCARD", key)
if count >= limit then return 0 end
redis.call("ZADD", key, now, now)
redis.call("EXPIRE", key, window / 1000)
return 1
Rate limit headers in responses: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset — standard practice, clients can adapt behavior.
Observability
Structured logging of each request with: api_key_id, endpoint, query_params (no secrets), response_time_ms, status_code, cache_hit.
Prometheus metrics:
const httpRequestDuration = new Histogram({
name: "http_request_duration_ms",
help: "HTTP request duration in milliseconds",
labelNames: ["method", "route", "status_code"],
buckets: [5, 10, 25, 50, 100, 250, 500, 1000, 2500],
});
P99 latency for each endpoint — main SLA metric. Alert if P99 > 500ms for cached requests or > 2000ms for uncached.
Stack: Fastify (Node.js) for REST + WebSocket, Redis for cache and rate limiting, PostgreSQL/ClickHouse for data, Prometheus + Grafana for monitoring.
Realistic API development timeline on top of ready database with data: 4–7 weeks including auth, rate limiting, documentation (OpenAPI spec).







