Matching Engine Development
Matching engine is the core of any exchange. It accepts orders, matches buyers with sellers, and generates trades. This is the most performant and most critical component: microsecond delays matter, pricing logic errors are catastrophic. Let's explore the architecture.
What is a Matching Engine
Functions
- Order acceptance: new order is validated and directed to the correct order book
- Matching: finding compatible orders, generating trades
- Order management: partial fills, cancellations, expiry
- Event publishing: trades, book updates → downstream systems
All this must happen sequentially (single-threaded matching core) to prevent race conditions. Parallelism is added at the different trading pairs level — each symbol has its own thread/core.
Price-Time Priority (FIFO)
Standard matching algorithm:
- Price priority: order with best price executes first
- Time priority: at equal price — earlier order has priority
For buys "best price" = higher (willing to pay more). For sells = lower (willing to accept less).
Data Structures
Order Book as Sorted Data Structure
use std::collections::{BTreeMap, VecDeque, HashMap};
// Price as integer: avoid float arithmetic (1.23456789 → 123456789)
// All prices in smallest units × 10^8
type Price = i64;
type Qty = u64;
type OrderId = u64;
#[derive(Clone, Debug)]
pub struct Order {
pub id: OrderId,
pub user_id: u64,
pub symbol: String,
pub side: Side,
pub price: Price, // 0 for market orders
pub quantity: Qty,
pub filled: Qty,
pub order_type: OrderType,
pub time_in_force: TimeInForce,
pub created_at: u64, // nanoseconds since epoch
}
impl Order {
pub fn remaining(&self) -> Qty {
self.quantity - self.filled
}
pub fn is_filled(&self) -> bool {
self.filled >= self.quantity
}
}
#[derive(Debug)]
pub struct PriceLevel {
pub price: Price,
pub total_qty: Qty,
pub orders: VecDeque<OrderId>, // FIFO
}
pub struct OrderBook {
pub symbol: String,
// Bids: DESC order (highest price first) → Reverse for BTreeMap
pub bids: BTreeMap<std::cmp::Reverse<Price>, PriceLevel>,
// Asks: ASC order (lowest price first)
pub asks: BTreeMap<Price, PriceLevel>,
// Fast lookup by order ID
pub orders: HashMap<OrderId, Order>,
// Sequence number for event ordering
pub sequence: u64,
}
Using BTreeMap instead of HashMap for price levels: O(log n) for insert/delete, but iteration over levels = O(1) for best price. HashMap is O(1) for lookup, but no order. Combination of both is correct solution.
Integer Prices Instead of Floating Point
Floating point arithmetic is forbidden in financial calculations. 0.1 + 0.2 != 0.3 in IEEE 754.
// Wrong: float
let price: f64 = 43750.50;
// Correct: integer with fixed precision
const PRICE_PRECISION: i64 = 100_000_000; // 10^8
let price: i64 = 4_375_050_000_000; // 43750.50000000
fn float_to_price(f: f64) -> Price {
(f * PRICE_PRECISION as f64).round() as Price
}
fn price_to_float(p: Price) -> f64 {
p as f64 / PRICE_PRECISION as f64
}
Matching Algorithm
Limit Order Matching
(See detailed Rust implementation in previous sections)
The process:
- Match against opposite side
- Check price crossing
- Execute at matched price level (FIFO)
- Update filled quantities
- Remove filled orders from book
- Add remaining to book if not fully filled
Market Order
Executes immediately at best available price. Has no price limit.
Stop Orders
Stop orders are triggers, not active orders in the book. When stop price is reached, a real order is created.
Event Sourcing and Persistence
Event Journal
Matching engine doesn't write directly to database — this would kill performance. Instead: all events written to append-only log (Kafka), downstream consumers asynchronously update database.
Matching Engine → Kafka (trades, order_updates) → DB Writer
→ Balance Updater
→ Market Data Publisher
→ Notification Service
Kafka guarantees ordering within partition (partition by symbol → all events for one symbol strictly ordered).
Snapshot + Replay
On restart matching engine restores state from last snapshot + replay events:
Snapshot created every N minutes or M events. Without snapshots with large event count, replay can take hours.
Performance and Optimizations
Profiling Bottlenecks
Typical matching engine bottlenecks in priority order:
- Memory allocation: each new order = heap allocation. Solution: object pool / arena allocator
- Serialization: packing events into Kafka. Solution: FlatBuffers / Cap'n Proto instead of JSON
- Locking: if multiple threads, any mutex kills throughput. Solution: single-threaded per-symbol + lock-free queues
- Cache misses: suboptimal memory layout. Solution: cache-friendly structures (SoA instead of AoS)
LMAX Disruptor Pattern
LMAX Disruptor is lock-free ring buffer for inter-thread communication, developed for high-frequency trading.
Ring buffer on 2^N elements, wrapped sequence number. Producers and consumers use CAS (compare-and-swap) operations instead of mutex. Latency: ~1-2 nanoseconds vs ~100ns for mutex.
Latency Benchmarks
Realistic latency figures for different implementations:
| Implementation | P50 | P99 | P99.9 |
|---|---|---|---|
| Python (asyncio) | 2ms | 15ms | 100ms |
| Go (stdlib) | 200µs | 2ms | 10ms |
| Java (Disruptor) | 50µs | 500µs | 2ms |
| Rust (custom) | 10µs | 100µs | 500µs |
| C++ (HFT grade) | 1-5µs | 20µs | 100µs |
P99.9 (worst 0.1% of requests) is particularly important for reputation — that's where trader complaints go.
Testing
Matching engine is covered with unit tests on edge cases:
- Partial fill with remainder
- FOK with insufficient liquidity
- IOC with partial execution
- Simultaneous cancellation and fill (race condition)
- Stop order triggers at moment of placement
- Decimal overflow at extreme values
Property-based testing (fuzzing) — random order sequences generated, invariant checked: total buy volume = total sell volume, balances reconcile.
Matching engine is component written once and maintained for years. Investments in code quality, tests, and documentation pay off many times over.







