Scraping Whale Transaction Data
A "whale" in on-chain analysis context — address with asset or transaction volume significant relative to overall market liquidity. Transfer of 50,000 ETH from exchange wallet to cold wallet creates price pressure and information signal. Monitoring such movements is a practical task for trading systems, risk management, and on-chain analytics.
What to Track Exactly
Not all large transactions are equally informative. Key patterns:
Exchange inflow/outflow: large transfer to exchange (inflow) — potential sale. Transfer from exchange (outflow) — accumulation or move to self-custody. For correct interpretation, need exchange address database.
Cross-chain bridges: large movements via bridges (Arbitrum bridge, Stargate, LayerZero) signal liquidity movement between networks.
DeFi events: large liquidity withdrawal from Uniswap pool, large loan repayment in Aave, opening/closing large position on GMX.
Stablecoin mint/burn: Tether and Circle print/burn USDT/USDC based on fiat deposits. Large mint — potential capital inflow to market.
Ethereum: Monitoring via eth_getLogs and WebSocket
Monitor large ERC-20 transfers in real-time via WebSocket subscription to Transfer events with filtering by size already in application (blockchain level doesn't support value-based filtering):
import asyncio
from web3 import AsyncWeb3, WebSocketProvider
from web3.middleware import ExtraDataToPOAMiddleware
WHALE_THRESHOLD_USDT = 500_000 * 10**6 # 500k USDT
USDT_ADDRESS = "0xdAC17F958D2ee523a2206206994597C13D831ec7"
async def monitor_usdt_whales():
w3 = AsyncWeb3(WebSocketProvider("wss://eth-mainnet.g.alchemy.com/v2/YOUR_KEY"))
transfer_filter = await w3.eth.filter({
'address': USDT_ADDRESS,
'topics': [w3.keccak(text="Transfer(address,address,uint256)").hex()]
})
async for event in transfer_filter.get_new_entries():
amount = int(event['data'], 16)
if amount >= WHALE_THRESHOLD_USDT:
from_addr = '0x' + event['topics'][1].hex()[26:]
to_addr = '0x' + event['topics'][2].hex()[26:]
await process_whale_transfer({
'from': from_addr,
'to': to_addr,
'amount_usdt': amount / 10**6,
'tx_hash': event['transactionHash'].hex(),
'block': event['blockNumber'],
})
For native ETH — separate logic via eth_getBlockByNumber with full_transactions=True and filtering by value:
async def scan_block_for_whale_eth(block_number: int, threshold_eth: float):
block = await w3.eth.get_block(block_number, full_transactions=True)
threshold_wei = w3.to_wei(threshold_eth, 'ether')
whale_txns = [
tx for tx in block.transactions
if tx['value'] >= threshold_wei
]
return whale_txns
Bitcoin: UTXO Model
Bitcoin has no Transfer events. Tracking large transactions — via mempool and block monitoring. Bitcoin Core RPC:
import bitcoinrpc
rpc = bitcoinrpc.connect_to_local()
def find_whale_transactions(block_hash: str, threshold_btc: float):
block = rpc.getblock(block_hash, verbosity=2)
whale_txns = []
for tx in block['tx']:
# Sum of all outputs
total_output = sum(
vout['value']
for vout in tx['vout']
if vout.get('scriptPubKey', {}).get('type') != 'OP_RETURN'
)
if total_output >= threshold_btc:
whale_txns.append({
'txid': tx['txid'],
'total_btc': total_output,
'outputs': tx['vout'],
'input_count': len(tx['vin']),
})
return whale_txns
Labeling: Who is Who
Raw address 0x28C6c06298d514Db089934071355E5743bf21d60 carries no meaning. Value appears with labels — knowledge base of which entity owns the address.
Label sources:
- Arkham Intelligence — commercial database with entity labels
- Etherscan tags — community-submitted labels, available via API
- Dune Analytics — community datasets (known exchange addresses, protocols)
- Custom database — built up during on-chain activity analysis
Typical label database structure:
CREATE TABLE address_labels (
address TEXT NOT NULL,
chain TEXT NOT NULL,
entity_name TEXT, -- 'Binance', 'Coinbase', 'Jump Trading'
entity_type TEXT, -- 'exchange', 'market_maker', 'fund', 'whale'
confidence SMALLINT, -- 1-100
source TEXT,
verified BOOLEAN DEFAULT FALSE,
PRIMARY KEY (address, chain)
);
Aggregation and Storage
Whale events should be stored with context for subsequent analysis:
CREATE TABLE whale_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
chain TEXT NOT NULL,
tx_hash TEXT NOT NULL,
block_number BIGINT,
block_time TIMESTAMPTZ NOT NULL,
from_address TEXT NOT NULL,
to_address TEXT NOT NULL,
token_address TEXT, -- NULL for native coin
amount_raw NUMERIC,
amount_usd NUMERIC,
from_label TEXT,
to_label TEXT,
event_type TEXT, -- 'exchange_inflow', 'exchange_outflow', 'defi_exit', etc.
notified BOOLEAN DEFAULT FALSE
);
CREATE INDEX ON whale_events (block_time DESC);
CREATE INDEX ON whale_events (from_address, block_time DESC);
Notifications
Telegram bot or Discord webhook for real-time alerts. Message format with maximum informativeness:
🐋 WHALE ALERT — Ethereum
💰 50,000,000 USDT ($50.0M)
📤 Binance (0x28C6...21d60)
📥 Unknown Wallet (0xF9e...3a14)
🔗 tx: 0x7f8...b2c
⏱ 12 seconds ago | Block 19,847,231
Custom thresholds for different assets and event types — configurable via admin interface or env file.
Ready-Made Services vs Custom Parser
Whale Alert, Lookonchain, Arkham have free and paid tiers with ready alerts. Custom parser is justified when: custom logic needed (specific contracts, specific patterns), data used in trading system with latency requirements, or integration with proprietary label database needed.
Developing whale transaction monitoring system for ETH + BTC with Telegram alerts, label database of 5000+ addresses, and history storage — 2–3 weeks.







