Pump-and-dump detection model development

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
Pump-and-dump detection model development
Complex
from 2 weeks to 3 months
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1214
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823

Development of Pump-and-Dump Detection Model

Pump-and-dump in crypto works faster than traditional markets: from coordinated buying start to dump — hours or even minutes. On-chain data completely public, creating unique detection opportunity: can see wallet movements, volume concentration, transaction synchronization in real time.

Task — build system detecting P&D scheme during pump phase, before dump, to warn users or auto-protect protocol.

Anatomy of pump-and-dump scheme

Understanding mechanics critical for building correct features.

Accumulation phase: organizers gradually buy token with small orders trying not to move price. Signs: growing unique holder addresses with stagnant price, unusual buy volume at night, synchronized wallets (receiving ETH from single source).

Pump phase: coordinated buying, usually coordinated in Telegram/Discord. Price 200-2000% in hours. Volume spike 10-100x average. Social media spike with template messages.

Dump phase: organizers sell at peak. Retail buyers attracted by rise enter and hold bags. Price crashes to pre-pump or lower.

Features for model

On-chain metrics

Volume anomaly score:

VAS = current_volume / rolling_avg_volume_30d

Values > 10 without fundamental news — strong signal.

Holder concentration delta: HHI (Herfindahl-Hirschman Index) change:

HHI = Σ (balance_i / total_supply)²

Rising HHI = token concentration in fewer addresses = accumulation.

Transaction synchronization: coefficient of sync between independent addresses making buys in same time window (±5 minutes). Organic growth has uniform distribution. P&D — spike.

Wallet clustering: graph of address relationships. Addresses receiving ETH from same source, buying with same EOA, similar transaction patterns — probably controlled by one entity. If 60%+ volume from cluster — signal.

Price-volume divergence: healthy growth has volume rising gradually with price. P&D has volume then sharp price — or synchronized without ramping.

Cross-market metrics

DEX vs CEX price discrepancy: if DEX price significantly above CEX — possible intentional DEX price manipulation.

Liquidity depth change: sharp LP removal before pump reduces resistance — classic prep pattern.

New wallet ratio: percent of transactions from wallets created < 7 days ago. High = fresh addresses for organizing.

Social signals (optional)

Telegram/Discord monitoring for ticker mentions. Sudden spike + positive sentiment + template calls = coordinated pump signal.

Detection system architecture

Data pipeline

Blockchain RPC (geth/erigon) 
    → Event streaming (WebSocket)
    → Kafka / RabbitMQ
    → Feature extractor (Python)
    → Feature store (Redis realtime, PostgreSQL historical)
    → ML model inference
    → Alert engine

Real-time blockchain connection via WebSocket:

from web3 import Web3, AsyncWeb3
import asyncio

async def stream_swaps(token_address: str, callback):
    w3 = AsyncWeb3(AsyncWeb3.AsyncWebsocketProvider('wss://mainnet.infura.io/ws/v3/KEY'))

    # Subscribe to Transfer events
    transfer_filter = await w3.eth.filter({
        'address': token_address,
        'topics': [Web3.keccak(text='Transfer(address,address,uint256)').hex()]
    })

    while True:
        events = await transfer_filter.get_new_entries()
        for event in events:
            await callback(event)
        await asyncio.sleep(0.1)

Feature extraction

@dataclass
class TokenFeatures:
    token_address: str
    timestamp: float
    volume_anomaly_score: float
    new_wallet_ratio: float
    transaction_sync_score: float
    holder_hhi_delta: float
    liquidity_depth_change: float
    price_velocity: float

def compute_sync_score(
    transactions: pd.DataFrame,
    window_seconds: int = 300
) -> float:
    """How synchronized are independent addresses in buys"""
    tx_times = transactions['timestamp'].values
    unique_senders = transactions['from'].nunique()
    if unique_senders < 2:
        return 0.0

    # Histogram of transactions by time windows
    bins = np.arange(tx_times.min(), tx_times.max() + window_seconds, window_seconds)
    hist, _ = np.histogram(tx_times, bins=bins)

    # Normalized variance: low variance = high synchronization
    if hist.mean() == 0:
        return 0.0
    cv = hist.std() / hist.mean()
    return max(0, 1 - cv / 2)

ML model

For P&D detection, XGBoost or LightGBM on tabular features work well. Interpretable (SHAP values), fast inference, robust to missing data.

import xgboost as xgb
from sklearn.model_selection import TimeSeriesSplit

# Split by time: can't use future data to predict past
tscv = TimeSeriesSplit(n_splits=5)

model = xgb.XGBClassifier(
    n_estimators=500,
    max_depth=6,
    learning_rate=0.01,
    subsample=0.8,
    colsample_bytree=0.8,
    scale_pos_weight=neg_count / pos_count,  # class balance
    eval_metric='aucpr',  # PR-AUC important with imbalance
    early_stopping_rounds=50
)

Evaluation metrics: precision-recall more important than accuracy due to strong class imbalance. Goal: precision > 0.7 at recall > 0.6. False positives (false alarms) annoy users; false negatives (missed P&D) reputation damage.

Alerting implementation

Thresholds and confidence levels

Not binary "P&D / not P&D", but probability with thresholds:

  • > 0.8: high confidence, immediate alert
  • 0.6 - 0.8: medium confidence, warning
  • < 0.6: monitoring, no alert

Integration with protocol

For protocols needing protection: trading contract can read risk score via oracle. If risk high — increased slippage tolerance or pause specific pool.

Limitations and disclaimers

Detection system doesn't eliminate P&D — it warns. Organizers adapt to detection algorithms (adversarial attacks). Model quality degrades over time, requires retraining.

Legal side: automatic trading blocks based on ML predictions carry legal risks depending on jurisdiction. Safer — warn users, not automatically limit trading.

Development timeline

Shard collection and labeling — 3-4 weeks, model — 2-3 weeks, infrastructure and alerting — 3-5 weeks, testing — 2 weeks.

Total: 8-14 weeks.