NFT Analytics Platform Development
NFT analytics is more complex than DeFi analytics in one critical aspect: each token has a unique price. In DeFi, a Uniswap pool gives you a clear price feed. In NFT, you need to value an asset whose last sale was three months ago, where the floor is for the entire collection, not for this specific token with a rare trait. Building a correct valuation model is half the work.
Data Sources and Their Limitations
On-chain events
Basic events to index:
-
Transfer(address from, address to, uint256 tokenId)— for ERC-721 -
TransferSingle/TransferBatch— for ERC-1155 -
OrderFulfilled(Seaport 1.5) — sales via OpenSea -
TakerBid/TakerAsk— LooksRare v2 -
EvProfit— Blur
Problem: each marketplace has its own events with its own structure. Seaport is the most complex—one OrderFulfilled event can encode a bundle sale of multiple NFTs in one transaction with arbitrary ERC-20. Parsing this data requires full decoding of consideration and offer arrays by ABI.
The Graph vs. self-hosted indexing
The Graph — the obvious choice to start. Existing subgraphs: OpenSea (unofficial), NFT sales aggregator subgraphs on hosted service. Limitations: hosted service shutting down in favor of decentralized network where queries cost GRT. For high-load analytics, query costs become significant.
Self-hosted via Ponder or Envio — Ponder (TypeScript framework for on-chain indexing) lets you write event handlers as regular TypeScript, stores data in PostgreSQL. Envio — similar with focus on speed (written in OCaml/Rust). For a platform with custom metrics, self-hosted indexer is preferable: full control over data schema.
Dual approach: historical data from Dune Analytics or Reservoir API (aggregates sales from all marketplaces), real-time via WebSocket subscription to events through Alchemy or QuickNode.
Valuation Models and Metrics
Rarity scoring
Standard formula — statistical rarity:
rarity_score(token) = Σ (1 / trait_frequency) for all traits
This is what rarity.tools does. Problem: doesn't account for trait correlation. A token with a rare combination of two common traits may be rarer than the simple formula shows.
Improved approach — information content rarity (IC score):
IC(trait) = -log2(P(trait))
rarity_score = Σ IC(trait_i)
Works correctly with uneven distributions.
Price metrics
| Metric | Formula / source | Application |
|---|---|---|
| Floor price | min(active listings) | Basic reference |
| Trait floor | min(listings with trait) | Token valuation |
| Wash trade adjusted volume | volume - suspected wash trades | Real volume |
| Holder distribution | unique wallets / total supply | Decentralization |
| Listing depth | number of listings by price levels | Liquidity profile |
| Diamond hands ratio | % holders > 6 months | Retention |
Wash trade detection
One of the key features of an analytics platform. Patterns for detection:
- Same addresses buy and sell to each other (transaction graph with cycles)
- Sales 1-3 blocks after purchase at non-market prices
- Buyer funded from same source as seller (Tornado Cash / mixer, or direct transfer)
- Repetitive patterns: A→B→A→B with price increase
Implemented via graph analysis on addresses — Neo4j or built-in graph in DuckDB are efficient enough. For on-chain heuristics use from/to in Transfer events + funding source analysis via transaction tracing (trace_transaction in Geth/Erigon).
Platform Technical Stack
Indexing infrastructure
Ethereum node (Erigon)
→ Ponder indexer (TypeScript)
→ PostgreSQL (TimescaleDB extension for time-series)
→ Redis (cache floor prices, trending collections)
→ ClickHouse (analytics aggregates, OLAP queries)
TimescaleDB is critical for time-series metrics: continuous aggregates allow computing hourly/daily OHLCV without recalculation on every query. ClickHouse justified at volumes > 100M events — analytics queries on it 10–100x faster than PostgreSQL.
API layer
GraphQL via Hasura over PostgreSQL — sufficient for most queries. Custom resolvers via Hasura Actions for complex calculations (rarity score, wash trade score).
For real-time data — WebSocket via Hasura subscriptions or custom Node.js server with pub/sub via Redis Streams.
Enrichment pipeline
NFT metadata not always on-chain. Need a pipeline:
- Get URL from
tokenURI()contract via batch RPC calls (eth_call multicall via Multicall3) - Fetch metadata from IPFS gateway / HTTP
- Parse
attributesarray - Store in PostgreSQL with computed rarity score
- Update when new tokens detected (Transfer from zero address)
Problem: IPFS fetch unreliable. Need retry with exponential backoff, fallback to multiple gateways (Cloudflare, dweb.link, nftstorage.link), and 5–10 second timeout.
Frontend
Next.js 14 with App Router. Key pages:
- Collection overview: floor chart (Recharts/TradingView lightweight), volume bars, holder distribution pie
- Token detail: rarity rank, trait comparison, price history, similar sales
- Wallet analytics: portfolio valuation, P&L by collection, unrealized gains
- Market trends: trending by volume/floor change, new mints heatmap
For charts with large data volumes — TradingView Lightweight Charts (WebGL rendering) faster than Recharts on 10k+ points.
What's Hard and Takes Time
Historical sync — indexing all NFT transactions from 2017 to today takes weeks even on a fast node. Use snapshots from Dune/Reservoir for bootstrap, then catch up with live data.
Multi-chain — supporting Ethereum + Polygon + Base + Solana requires 4 different indexers with different ABIs and event structures. Solana especially painful: no EVM, events are program logs in base64.
Accurate pricing — aggregating sales correctly is hard. Need to account for currency (ETH, WETH, USDC, BLUR token), convert to USD at historical rate at moment of sale, exclude wash trades. Each step adds error probability.
Realistic timeline for MVP (one network, basic metrics, simple UI): 6–8 weeks. Full multi-chain platform with wash trade detection and portfolio analytics: 3–4 months.







