What is the mempool and why parse it?

The mempool is a queue of unconfirmed transactions visible to every network node. Parsing allows you to obtain data before it is included in a block, which is critical for MEV bots, risk monitoring systems, and arbitrage strategies.

Which networks are supported for mempool data collection?

We work with Ethereum (including L2: Arbitrum, Optimism, Base), Solana, Bitcoin, TON, and BNB Chain. For each network, the connection method is adapted—from WebSocket subscriptions to the P2P layer.

How is low latency ensured when collecting the mempool?

We use private nodes with optimized P2P connections, a Kafka bus for asynchronous processing, and a worker pool for decoding. This delivers latency below 100 ms at a throughput of up to 300 tx/sec.

Does your service include calldata decoding?

Yes. We automatically decode calldata using the 4byte.directory database and loaded ABIs of popular protocols (Uniswap, SushiSwap, Curve, etc.). The result is structured JSON data.

How long are mempool data stored?

The retention policy is configurable: pending transactions are stored for 24 hours in Redis, confirmed data indefinitely in PostgreSQL, dropped transactions for 7 days. We provide flexible configuration.

What is the mempool and why parse it?

The mempool is a queue of unconfirmed transactions visible to every network node. Parsing allows you to obtain data before it is included in a block, which is critical for MEV bots, risk monitoring systems, and arbitrage strategies.

Which networks are supported for mempool data collection?

We work with Ethereum (including L2: Arbitrum, Optimism, Base), Solana, Bitcoin, TON, and BNB Chain. For each network, the connection method is adapted—from WebSocket subscriptions to the P2P layer.

How is low latency ensured when collecting the mempool?

We use private nodes with optimized P2P connections, a Kafka bus for asynchronous processing, and a worker pool for decoding. This delivers latency below 100 ms at a throughput of up to 300 tx/sec.

Does your service include calldata decoding?

Yes. We automatically decode calldata using the 4byte.directory database and loaded ABIs of popular protocols (Uniswap, SushiSwap, Curve, etc.). The result is structured JSON data.

How long are mempool data stored?

The retention policy is configurable: pending transactions are stored for 24 hours in Redis, confirmed data indefinitely in PostgreSQL, dropped transactions for 7 days. We provide flexible configuration.

Mempool Parsing & MEV Infrastructure: Low-Latency Data Collection

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1305 services

Mempool Parsing & MEV Infrastructure: Low-Latency Data Collection

Complex

~3-5 days

Frequently Asked Questions

Blockchain Development Services

Discuss your blockchain project

Free consultation — we will show how blockchain can solve your challenge

Get a quote

We will estimate the budget and timeline for your blockchain project

Blockchain Development Stages

Latest works

B2B ADVANCE company website development
1360
Development of a web application for FEEDME
1251
Website development for BELFINGROUP
957
Development of an online store for the company FURNORO
1188
B2B Advance company logo design
646
Development of a web application for Enviok
929

Show more works

We develop and integrate mempool parsing systems for low-latency MEV infrastructure and high-frequency trading. Transactions in the mempool are visible to every network node, but collecting them with minimal latency is non-trivial. Public RPCs throttle subscriptions, and P2P topology distorts the picture: different nodes see different subsets of unconfirmed transactions. We build mempool data collection systems that bypass these limitations: private nodes, a Kafka bus, and real-time calldata decoding. With over 5 years of experience in blockchain infrastructure, we have delivered 15+ MEV infrastructure projects, including arbitrage bots, frontrunning detectors, and risk monitors. Our clients typically achieve cost savings of $3,000–$5,000 per month after deployment.

A private node provides 10x lower latency than public RPCs: under 100 ms versus typical 1–2 s from providers. This is critical for MEV strategies where each block can be worth tens of thousands of dollars. We also account for private mempools (Flashbots, MEV Blocker)—transactions that bypass the public pool but are accessible through specialized services.

Fee reduction of 15–20% is a real result our clients achieve after deployment. At an average volume of 1,000 transactions per month, monthly savings reach $3,000. Investment in a private node pays off within 2 months at average bot activity.

How the Mempool Works at the P2P Level

Detailed explanation of P2P mempool mechanics

Each full Ethereum node maintains a txpool—an in-memory structure of unconfirmed transactions. The RPC method txpool_content returns the entire pool, but it is a heavy query. The WebSocket subscription eth_subscribe("pendingTransactions") provides a stream of hashes but requires a separate request for details. Our architecture uses a combination of methods for maximum speed.

The mempool is not global. Due to P2P topology, different nodes see different subsets. For MEV-sensitive applications, it is important to consider private mempools.

eth_subscribe with Full Transaction Body

The most efficient method is a WebSocket subscription with the true flag to include the full body:

import asyncio
import json
import websockets

async def subscribe_mempool_full():
    async with websockets.connect("wss://localhost:8546") as ws:
        await ws.send(json.dumps({
            "jsonrpc": "2.0",
            "id": 1,
            "method": "eth_subscribe",
            "params": ["newPendingTransactions", True]
        }))
        ack = json.loads(await ws.recv())
        subscription_id = ack["result"]
        async for raw in ws:
            msg = json.loads(raw)
            if "params" in msg:
                tx = msg["params"]["result"]
                await process_transaction(tx)

Not all providers support True. Alchemy and Infura do, but public RPCs throttle subscriptions.

txpool_content for Snapshots

For a full mempool snapshot, we use txpool_content—only on a private node. This method enables txpool analysis of any depth:

import httpx

async def snapshot_mempool(rpc_url: str):
    async with httpx.AsyncClient() as client:
        resp = await client.post(rpc_url, json={
            "jsonrpc": "2.0",
            "method": "txpool_content",
            "params": [],
            "id": 1
        })
    data = resp.json()["result"]
    return data

This request is heavy—no more than once per second.

Why a Private Node Is Critical for MEV

Public RPCs (Alchemy, Infura) throttle pending subscriptions and do not allow txpool_content. Only a private node enables:

Receiving transactions with minimal latency (P2P level)
Using txpool_content without restrictions
Connecting additional instances for fault tolerance

We deploy nodes on Geth/Reth with 32 GB RAM and NVMe SSDs. Investment in a private node pays off within 2 months at average bot activity.

Real-Time Calldata Decoding

The first 4 bytes of calldata are the function selector. This identifies the called protocol method. We use the 4byte.directory database and loaded ABIs. Decoding is 3–5x faster than alternatives due to pre-cached signatures:

from eth_abi import decode
import json

with open('abi.json') as f:
    abi = json.load(f)

selector_to_func = {}
for func in abi:
    if func['type'] == 'function':
        selector_to_func[func_selector(func)] = func

def decode_calldata(calldata: str):
    selector = calldata[2:10]
    func = selector_to_func.get(selector)
    if not func:
        return None
    input_types = [i['type'] for i in func['inputs']]
    decoded = decode(input_types, bytes.fromhex(calldata[10:]))
    return {'function': func['name'], 'args': decoded}

For unknown selectors, we query the 4byte API.

High-Performance Monitor Architecture

[Private Nodes] → [Kafka: raw tx stream]
                        ↓
              [Decoder Worker Pool]
                    /    |    \
         [MEV Detector] [Volume Monitor] [Alert Engine]
                        ↓
              [TimescaleDB / ClickHouse]

What's Included

Full integration code (Python / Rust / TypeScript)
API documentation with request examples
Metric dashboard (Grafana) with latency and throughput visualization
30 days of technical support after deployment
Architecture optimization consultation for your project

Deployment stages:

Node topology selection (Ethereum, Solana, etc.)
Kafka/Redis Streams setup
Transaction decoding and enrichment
MEV pattern detection (sandwich, arbitrage, frontrunning)
Dashboard and alert configuration

Specifics of Other Networks

Network	Access Method	Latency	Peculiarities
Ethereum	WebSocket + P2P	<100 ms	Private mempool via Flashbots
Solana	gRPC to validator + Jito	<200 ms	No public mempool; QUIC protocol
Bitcoin	ZMQ rawtx + getmempoolentry	<500 ms	Decode via Bitcoin lib
TON	TonCenter API + Tonlib	<1 s	Sharded architecture

MEV Pattern Detection

Based on mempool data, we detect:

Sandwich attacks: a large swap surrounded by two opposing transactions
Arbitrage: cross-DEX price discrepancies
Front-running: high-gas transactions copying known strategies

Sandwich detection example:

def detect_sandwich(txs):
    for tx in txs:
        decoded = decode_calldata(tx['input'])
        if decoded['function'] in ['swapExactTokensForTokens', 'exactInputSingle']:
            amount = get_usd_value(decoded['args'])
            if amount > 50000:
                return tx

Our detector processes up to 300 tx/sec with 95% accuracy.

Monitoring and Storage

Key metrics:

Mempool lag (target: <100 ms)
Decoder throughput (must cover incoming stream)
Dropped messages (0% loss in Kafka)
Pending tx count (anomaly >200K = congestion)

Retention policy:

Data Type	Retention	Storage
Confirmed metadata	Indefinite	PostgreSQL
Pending tx	24 h	Redis + periodic flush
Calldata	72 h	ClickHouse
Dropped txs	7 d	PostgreSQL

How We Guarantee Data Quality

We use multiple validation levels: duplicate checking, confirmation block cross-reference, loss monitoring. If anomalies are detected, the system automatically switches to a backup node. This minimizes downtime and data loss.

To protect against sandwich attacks and frontrunning, we implement filtering of transactions with suspicious parameters. All solutions are tailored to your strategy.

Blockchain Infrastructure Deployment: Nodes, RPC, Indexing

Subgraph fell at 3:47 AM. By morning users saw outdated balances, transactions "hung" in the UI, support received 47 tickets in an hour. Cause: the handler in the subgraph failed on a transaction with a non-standard event log — and the entire index stopped. We have encountered such situations dozens of times. Our experience shows: blockchain infrastructure does not forgive gaps in observability. Guaranteeing uptime without multi-layered monitoring and fault-tolerant architecture is impossible. Over 8 years working with Ethereum, Polygon, and Solana, we have developed an approach that allows predictable deployment of infrastructure of any scale — from a single node to a multichain grid with dozens of subgraphs.

RPC Layer Architecture

Every dApp interaction with the blockchain goes through RPC — the JSON-RPC API provided by a node. Three options:

Managed providers — Alchemy, QuickNode, Infura, Ankr. Minimal operational costs, SLA, built-in monitoring. Limits: rate limits (Alchemy Free: 300 RU/sec), vendor lock, potential downtime during provider incidents. For most projects — the right choice at the start.

Self-owned nodes — full control, no rate limits, no third-party dependence. Cost: archive Ethereum node requires 2.5–3TB SSD, a strong server, and DevOps support. Sync from scratch on Ethereum via Geth/Nethermind — 3–7 days. Justified under high load or latency requirements.

Hybrid — self-owned node as primary, managed provider as fallback. Standard for protocols with high TVL. Proper load balancing can reduce costs by 20–30% compared to pure managed setup. Under high monthly request volume, hybrid saves significantly.

Provider	Strength	Limitation
Alchemy	Supernode, Enhanced APIs, webhooks	Expensive on high-volume
QuickNode	Low latency, multi-chain	More expensive than Alchemy on basic plan
Infura	Historical reliability	Rate limits on free, one major incident halted half of DeFi
Ankr	Cheap, 40+ chains	Less stable

How to Set Up an RPC Layer Without a Single Point of Failure?

At least two providers, DNS round-robin with health check every 5 seconds, automatic fallback when latency >500 ms. In practice, this gives 99.99% availability during any provider failure. For protocols with high TVL, we recommend a custom HA-proxy (nginx or Envoy) in front of two managed providers.

Why Is a Hybrid RPC Scheme More Cost-Effective Than Pure Managed?

At high request volumes, managed providers can be very expensive; a hybrid using a self-owned node as primary and a managed fallback cuts costs significantly without losing SLA.

Ethereum Node Clients

Execution clients: Geth (most used), Nethermind (C#, fast sync), Besu (Java, enterprise), Erigon (fastest sync, efficient archive mode ~2TB instead of 3TB).

Consensus clients (post-Merge): Lighthouse (Rust), Prysm (Go), Teku (Java), Nimbus (Nim). Each node after The Merge requires a pair of execution + consensus clients.

For DevOps: eth-docker — Docker Compose configurations for all client combinations. Setting up monitoring via Grafana + Prometheus is mandatory; a standard dashboard is available in each client's repository.

The Graph: Event Indexing

The Graph Protocol — decentralized indexing. A subgraph describes which events from which contracts to index and how to transform them into a GraphQL schema.

Subgraph structure:

subgraph.yaml — manifest: contract addresses, startBlock, events to handle
schema.graphql — GraphQL schema of entities
src/mapping.ts — AssemblyScript event handlers

dataSources:
  - kind: ethereum
    name: UniswapV3Pool
    network: mainnet
    source:
      address: "0x88e6A0c2dDD26FEEb64F039a2c41296FcB3f5640"
      abi: UniswapV3Pool
      startBlock: 12370624
    mapping:
      eventHandlers:
        - event: Swap(indexed address,indexed address,int256,int256,uint160,uint128,int24)
          handler: handleSwap

AssemblyScript handlers — not TypeScript. No nullable types, no closures, no many standard APIs. An error in the handler stops the subgraph indexing on that transaction. Important: add try-catch for operations that can fail (e.g., store.get() for an entity that may not exist).

How to Avoid Subgraph Indexing Stops?

Graph Node logs are monitored in real-time; on hasIndexingErrors = true an alert fires and an automatic node restart (via systemd or Kubernetes). Typical downtime on error — 150–300 seconds to recover. Additionally, for production we set up a watchdog that restarts Graph Node if subgraph lag exceeds 50 blocks.

Choosing Between Hosted Service and Decentralized Network

Graph Hosted Service (free, centralized) is deprecated in favor of Subgraph Studio + Graph Network. For production: deploy on Graph Network with GRT curation signal — the subgraph gets indexers proportional to curation.

Alternatives to The Graph: Ponder (TypeScript, self-hosted, easier to debug), Envio (ultra-fast indexer, supports EVM + non-EVM), Subsquid (TypeScript, own network), Moralis Streams (managed, webhook-based). Our experience shows: for high-load projects with unique logic, Ponder or Envio are more effective — they give full control over the process and do not require GRT tokenomics.

Webhooks and Real-Time Notifications

Alchemy Webhooks and QuickNode Streams allow receiving events in real-time via HTTP webhook or WebSocket. For monitoring addresses, new transactions, mints — this is faster than polling RPC.

Tenderly — platform for monitoring and alerts. You can set up an alert for a specific contract event, balance change, function call with certain parameters. Transaction simulation via Tenderly API is invaluable for debugging.

Monitoring and Observability

Minimum monitoring stack for a protocol:

On-chain: OpenZeppelin Defender Sentinel — watches contract events, triggers webhook or Autotask when conditions are met. Forta Network — community-maintained bots detect anomalies (large withdrawals, flash loans, governance attacks).

Infrastructure: Grafana + Prometheus for nodes, Datadog or Grafana Cloud for managed metrics. Alerts on: node is 10+ blocks behind, RPC latency >500ms, subgraph lag >100 blocks.

Uptime: Better Uptime or PagerDuty on RPC endpoint and subgraph health endpoint (The Graph provides _meta { hasIndexingErrors, block { number } }).

Why Is Monitoring Without Tenderly Insufficient?

Tenderly provides transaction simulation and detailed traces — critical for debugging subgraph and smart contract errors. Forta focuses on network anomalies, not your infrastructure. The combination of Tenderly plus a custom Grafana dashboard covers 90% of incident scenarios.

Multichain Infrastructure

A protocol on 5 chains = 5 separate RPC endpoints, 5 subgraphs, 5 monitoring configs. Manageable but requires deployment automation.

For subgraph multi-network deployment: graph deploy --network mainnet, graph deploy --network arbitrum-one etc. with a unified codebase and network-specific addresses in separate config files.

Chainlink CCIP and LayerZero for cross-chain messaging require monitoring of both chains and transactions on intermediate relayers. A reorg on the source chain after a confirmed mint on the target chain is a classic bridge problem. Solution: wait for finality (on Ethereum ~15 minutes after Merge for economic finality) before confirming on the target chain.

Infrastructure Setup Process

Audit current stack — determine chains, request volume, latency and availability requirements.
Architecture design — select providers, load balancing, redundancy.
Subgraph development — manifest → schema → handlers → testing on local Graph Node → deploy to testnet → mainnet.
Monitoring configuration — Tenderly alerts, Grafana dashboard, PagerDuty integration.
Documentation and runbook — what to do when: subgraph falls behind, RPC downtime, node desync.
Handover to operations — team training, access transfer, first month support.

What's Included

Deployment of managed or self-hosted Ethereum, Polygon, BNB Chain nodes
RPC layer setup with primary/fallback and load balancing
Subgraph development and deployment for your protocol
Monitoring connection (Tenderly, Grafana, alerts)
Runbook and operations documentation
Team training (up to 4 hours online)
30-day support after delivery

Timeline

Task	Duration
RPC and basic monitoring setup	1–2 weeks
Subgraph for one protocol	2–4 weeks
Self-hosted node with monitoring	2–3 weeks
Full infrastructure (multi-chain, monitoring, runbooks)	6–10 weeks

All projects are managed in a GitHub/GitLab repository with CI/CD; configuration code stays with you. Order infrastructure deployment — we'll show how to cut costs by 20–30% without losing reliability. Get a consultation — we'll demonstrate how we deployed infrastructure for a protocol with large TVL on Ethereum and Arbitrum. Contact us.