DeFi Protocol Data Scraping (TVL, APY, Pools)

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
DeFi Protocol Data Scraping (TVL, APY, Pools)
Medium
~2-3 business days
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1214
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823

Scraping DeFi Protocol Data (TVL, APY, Pools)

Task looks simple at first: collect TVL and APY across protocols, put in database, serve via API. In practice — each protocol has its own calc logic, some data only on-chain, some via subgraphs with delay, and APY changes every block. Plus many protocols deploy different versions on different chains with incompatible ABI.

Data sources and their specifics

The Graph: primary aggregated data source

Most major protocols have official subgraphs: Uniswap, Curve, Aave, Compound, Balancer, Yearn. The Graph Studio allows querying historical and current data via GraphQL.

Problems we face:

Latency. Subgraph updates with 1-10 minute delay after on-chain events. For real-time monitoring — doesn't fit. For historical data and dashboards — normal.

Outdated subgraphs. Uniswap V2 subgraph long unmaintained by team, data may be incomplete. For Uniswap V3 official subgraph periodically lags at high volume.

Pagination. The Graph returns max 1000 records per query. Getting all Uniswap V3 pools (>50,000) requires pagination via skip or id_gt pattern.

query GetPools($lastId: String) {
  pools(first: 1000, where: { id_gt: $lastId }, orderBy: id) {
    id
    token0 { symbol, decimals }
    token1 { symbol, decimals }
    totalValueLockedUSD
    volumeUSD
    feeTier
  }
}

TVL note in The Graph: Uniswap V3 subgraph calculates TVL as sum of token values in USD via internal price feed. This price feed sometimes gives wrong values for illiquid tokens — pool with $500k real TVL might show as $50M from manipulated one-token price. Need check via external source.

On-chain queries for accurate data

For data important to get accurately and fresh — direct eth_call to contracts:

Aave v3 TVL: Pool.getReserveData(asset) returns aToken.totalSupply() * liquidityIndex. For each asset in each market.

Curve APY: Minter.minted(gauge, user) for CRV emission, gauge.inflation_rate() for current rate. Real APY = (crv_per_year * crv_price) / gauge_tvl_usd.

Uniswap V3 fee APY: positions.tokensOwed0/1 — accumulated fees. For general pool APY: pool.feeGrowthGlobal0X128 — delta per period / liquidity.

Multicall3 (0xcA11bde05977b3631167028862bE2a173976CA11) — deployed on all major chains, allows batching hundreds eth_call into one transaction. Instead of 100 separate RPC requests — one batch. Critical for scraping performance.

DeFi Llama API

https://api.llama.fi — public API without key for TVL data across most protocols. Data structure:

GET /tvl/{protocol}           → current TVL
GET /protocol/{protocol}      → historical TVL + breakdown
GET /pools                    → APY across all pools (~10k records)

/pools — gold mine: APY already calculated for thousands pools across chains. But DeFi Llama updates every few minutes — for realtime tasks need own calculation.

Data collection architecture

Collection layers

Scheduler (cron / event-driven)
  ├── GraphQL Fetcher (The Graph subgraphs)
  ├── On-chain Fetcher (Multicall3 + ethers.js)
  ├── HTTP Fetcher (DeFi Llama, CoinGecko)
  └── WebSocket Listener (real-time events)
        ↓
  Normalizer (single format)
        ↓
  TimescaleDB / PostgreSQL
        ↓
  API (REST/GraphQL)

Normalizer — key component. Each protocol returns data in own format. Normalization: { protocolId, chainId, poolAddress, tvlUsd, apy, timestamp }. Single schema enables cross-protocol comparison.

APY calculation

APY = Annual Percentage Yield with compound. For most DeFi protocols data is APR (without compound), needs conversion:

APY = (1 + APR/n)^n - 1, where n — number of compound periods per year.

For lending protocols APR usually already with compound (Aave v3 calculates via liquidityRate). For LP positions — no: fees accrue without reinvest.

Real APY components for Uniswap V3 LP position:

  1. Trading fees APR (depends on volume and position range)
  2. Liquidity mining rewards (if any incentives)
  3. Minus IL (historical estimate)

Honest APY without IL deduction misleads users. Show both numbers.

Error handling and rate limiting

RPC providers have rate limits. Alchemy free tier: 300 CUPS (compute units per second). One eth_call = 10-40 CU, Multicall3 = 20 CU regardless request count inside. Batch maximally.

The Graph: 1000 requests per day free plan. Use cache with TTL — most data doesn't need update more than every 5 minutes.

Retry with exponential backoff on all HTTP requests. Dead letter queue for failed fetches — don't lose data on temporary RPC failures.

Development stack

TypeScript + Node.js for scrapers. PostgreSQL + TimescaleDB for time-series storage. Redis for intermediate data caching. Docker Compose for local development.

ethers.js v6 for on-chain interactions. graphql-request for The Graph queries. p-limit for concurrency control (don't overwhelm RPC).

Timeline estimates

Scraper for 2-3 protocols on one chain with basic API — 2-3 days. Multi-protocol, multi-chain system with historical base and normalization — 1-2 weeks depending on source count and APY accuracy requirements.