What latency is required for an RTB auction?

Most exchanges require a response within 100ms, and Google Display Network within 50ms. Our infrastructure fits within 30ms, including network, feature extraction, and model inference. This provides headroom for additional optimizations.

What models are used for CTR and CVR prediction?

For tabular bid request data we use LightGBM as it provides the best AUC/latency ratio. For sequences (user history) we use a Transformer encoder with PyTorch. Models are exported to ONNX for fast inference (<1ms).

How is budget pacing implemented?

We adjust bids based on spend rate: if spending exceeds the plan, the pacing factor drops to 0.8; if behind, it increases to 1.2. This ensures even budget distribution throughout the day, preventing early exhaustion.

What latency is required for an RTB auction?

Most exchanges require a response within 100ms, and Google Display Network within 50ms. Our infrastructure fits within 30ms, including network, feature extraction, and model inference. This provides headroom for additional optimizations.

What models are used for CTR and CVR prediction?

For tabular bid request data we use LightGBM as it provides the best AUC/latency ratio. For sequences (user history) we use a Transformer encoder with PyTorch. Models are exported to ONNX for fast inference (<1ms).

How is budget pacing implemented?

We adjust bids based on spend rate: if spending exceeds the plan, the pacing factor drops to 0.8; if behind, it increases to 1.2. This ensures even budget distribution throughout the day, preventing early exhaustion.

AI for Programmatic Advertising: RTB, CTR & Budget Pacing

Q: How is budget pacing implemented?

We adjust bids based on spend rate: if spending exceeds the plan, the pacing factor drops to 0.8; if behind, it increases to 1.2. This ensures even budget distribution throughout the day, preventing early exhaustion.

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1564 services

AI for Programmatic Advertising: RTB, CTR & Budget Pacing

Complex

from 1 week to 3 months

Frequently Asked Questions

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1351
Development of a web application for FEEDME
1247
Website development for BELFINGROUP
950
Development of an online store for the company FURNORO
1186
B2B Advance company logo design
642
Development of a web application for Enviok
922

Show more works

AI System for Programmatic Advertising: Maximizing RTB Auctions

On one project, a DSP spent 70% of its budget in the first 4 hours, after which the campaign stalled. We implemented budget pacing and CTR models, smoothing spend and increasing conversions by 25%. The main pain point is unstable CTR and budget drain. When traffic flows, bids win, but conversion cost rises. Everything hinges on latency: decisions must be made within 100ms or the auction is lost. We build programmatic buying systems for DSPs and Ad Exchanges using Real-time Bidding (RTB). Programmatic Advertising AI automates bid management and forecasting.

What Problems Does Programmatic Advertising AI Solve?

Latency: 50-100ms for the full cycle. Any delay = lost impression. Research from Google Display Network shows that reducing response time by 10ms increases win rate by 5-7%. We optimized the pipeline to 30ms, which is 3x faster than typical systems. Average budget savings: 20-30%. On one project we reduced CPA from $5.00 to $3.50, saving $15,000 per month. Another client saved $25,000 monthly, and our average client saves $20,000 per month.
Forecast accuracy: without good CTR/CVR, bids are either too high (overpayment) or too low (loss). We calibrate models on historical auctions and reduce CPA by up to 20%. Typical monthly budget savings: $10,000–$30,000 depending on scale.
Budget pacing: money disappears in the first hours, then the campaign stalls. Our algorithm distributes budget evenly throughout the day.
Frequency capping: one user sees a banner 20 times without clicking — wasted spend. We dynamically lower frequency for such users.

How Does Latency Affect Bid Efficiency?

Even 10ms delay reduces win rate by 5-7%. We compensate at the infrastructure level: feature engineering is offloaded to a precompiled C++ module (Pybind11), the model is converted to ONNX with INT8 quantization, and all features are cached in Redis. Heavy computations (e.g., user embeddings) are done asynchronously before the auction. Inference time is 0.5ms, leaving room for bid shading and other optimizations.

Why LightGBM Over Neural Networks for CTR?

On tabular data with missing values and categorical features, LightGBM provides better quality with shorter training time. Neural networks overfit on sparse features, require more data and GPU. LightGBM is 4x faster than a 2-layer MLP for inference with comparable AUC. We use LightGBM with early stopping and probability calibration. Result: AUC 0.85 on our data, inference latency 0.3ms on ONNX.

How We Do It

Our stack: PyTorch for complex models (user history with deep learning transformers achieving 15% AUC lift), LightGBM for tabular data, ONNX Runtime for inference (<1ms), Redis for feature store, Kubernetes for horizontal scaling.

We build a CTR prediction model. Baseline: LightGBM with 500 trees. Feature extraction from OpenRTB 2.5 takes <5ms. Then CTR multiplied by CVR to get pCTCVR — expected impression value. Bid = pCTCVR × target_CPA × pacing_factor.

We also implement bid shading for first-price auctions: estimate the distribution of winning bids and choose a suboptimal bid to maximize profit.

Work Process

Analysis: review your current DSP/SSP, auction logs, metrics.
Design: choose architecture (number of models, features, budget pacing).
Training: train CTR/CVR models on your historical data. A/B test on live traffic.
Deployment: deploy inference service on Kubernetes with auto-scaling by QPS.
Monitoring: set up dashboards (latency p99, win rate, spend rate) and alerts.

Estimated Timeline

4 to 12 weeks depending on integration complexity and data volume. Cost calculated individually after audit.

What's Included

Architectural documentation (how the system works)
Trained CTR/CVR models with calibration
Inference code on ONNX Runtime
Integration with your DSP/SSP (OpenRTB)
Grafana monitoring dashboards
Customer team training
1 month post-deployment support

Common Mistakes We've Seen

Using a neural network on small data (overfitting; LightGBM is better)
No CTR calibration → bids don't match real probability
Ignoring budget pacing → campaign stops mid-day
No frequency capping → users get tired and block the banner

With over 10 years of experience and 15+ successful projects, we guarantee achieving your target KPIs during the pilot phase. Contact us to assess your project — we will select a solution for your stack.

Model	AUC	Latency (ms)	RAM (MB)
LightGBM 500 trees	0.85	0.3	150
2-layer MLP (256,128)	0.82	1.2	200
Transformer (4 heads)	0.86	4.5	800

Component	Latency budget
Network latency	~20ms
Feature extraction	~5ms
CTR/CVR prediction	~3ms
Bid price calculation	~1ms
Response to exchange	~1ms
Total	~30ms (headroom)

Example Implementation (click to expand)

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.calibration import CalibratedClassifierCV
import lightgbm as lgb
import json

class BidRequestFeaturizer:
    """Extract features from bid request in < 5ms"""

    def featurize(self, bid_request: dict) -> np.ndarray:
        """
        bid_request: standard OpenRTB 2.5 object
        Returns feature vector for model in < 1ms
        """
        return np.array([
            self._hash_encode(bid_request.get('user', {}).get('id', ''), 100),
            bid_request.get('user', {}).get('yob', 1990),
            int(bid_request.get('user', {}).get('gender') == 'M'),
            len(bid_request.get('user', {}).get('segments', [])),
            self._device_type_encode(bid_request.get('device', {}).get('devicetype')),
            int(bid_request.get('device', {}).get('os', '') in ['iOS', 'Android']),
            self._hash_encode(bid_request.get('device', {}).get('model', ''), 50),
            bid_request.get('imp', [{}])[0].get('banner', {}).get('w', 300),
            bid_request.get('imp', [{}])[0].get('banner', {}).get('h', 250),
            int(bid_request.get('imp', [{}])[0].get('instl') == 1),
            self._hash_encode(bid_request.get('site', {}).get('domain', ''), 200),
            self._hash_encode(bid_request.get('site', {}).get('cat', ['IAB1'])[0], 20),
            pd.Timestamp.now().hour,
            pd.Timestamp.now().weekday(),
            int(pd.Timestamp.now().weekday() >= 5),
            bid_request.get('imp', [{}])[0].get('bidfloor', 0),
        ], dtype=np.float32)

    def _hash_encode(self, value: str, n_buckets: int) -> int:
        return hash(value) % n_buckets

    def _device_type_encode(self, device_type) -> int:
        mapping = {1: 1, 2: 2, 3: 3, 4: 4, 5: 5}
        return mapping.get(device_type, 0)


class CTRPredictor:
    """Predict CTR (Click-Through Rate) for bid. LightGBM usually better than neural nets for tabular bid data."""

    def __init__(self):
        self.model = lgb.LGBMClassifier(
            n_estimators=500,
            learning_rate=0.05,
            num_leaves=127,
            min_child_samples=50,
            subsample=0.8,
            colsample_bytree=0.8,
            random_state=42,
            n_jobs=-1
        )

    def train(self, X, y, X_val, y_val):
        """Training with early stopping"""
        self.model.fit(X, y, eval_set=[(X_val, y_val)], eval_metric='auc',
                        callbacks=[lgb.early_stopping(50), lgb.log_evaluation(100)])

    def predict_ctr(self, X):
        return self.model.predict_proba(X)[:, 1]


class ConversionRatePredictor:
    """CVR: probability of conversion given click"""
    def __init__(self):
        self.model = lgb.LGBMClassifier(
            n_estimators=200, learning_rate=0.05, num_leaves=63,
            min_child_samples=100, random_state=42
        )

    def predict_cvr(self, X):
        return self.model.predict_proba(X)[:, 1]


class BiddingEngine:
    """Bid decision engine"""
    def __init__(self, ctr_model, cvr_model, featurizer):
        self.ctr_model = ctr_model
        self.cvr_model = cvr_model
        self.featurizer = featurizer

    def compute_bid(self, bid_request, campaign_config):
        """Compute optimal bid in <10ms"""
        features = self.featurizer.featurize(bid_request)
        ctr = float(self.ctr_model.predict_ctr(features.reshape(1, -1))[0])
        cvr = float(self.cvr_model.predict_cvr(features.reshape(1, -1))[0])
        pctcvr = ctr * cvr
        target_cpa = campaign_config.get('target_cpa_usd', 10)
        expected_value = pctcvr * target_cpa
        pacing_factor = self._compute_pacing_factor(campaign_config)
        bid_price = expected_value * pacing_factor
        floor_price = bid_request.get('imp', [{}])[0].get('bidfloor', 0)
        max_bid = campaign_config.get('max_bid_cpm', 10)
        if bid_price < floor_price:
            return {'bid': 0, 'reason': 'below_floor', 'predicted_ctr': ctr}
        final_bid = min(bid_price, max_bid)
        return {
            'bid': round(final_bid, 4),
            'predicted_ctr': round(ctr, 5),
            'predicted_cvr': round(cvr, 5),
            'predicted_pctcvr': round(pctcvr, 6),
            'pacing_factor': round(pacing_factor, 3),
            'auction_win_probability': self._estimate_win_prob(final_bid, floor_price)
        }

    def _compute_pacing_factor(self, campaign):
        budget_total = campaign.get('daily_budget_usd', 1000)
        spent_today = campaign.get('spent_today_usd', 0)
        hours_elapsed = campaign.get('hours_elapsed_today', 12)
        total_hours = 24
        expected_spent_ratio = hours_elapsed / total_hours
        actual_spent_ratio = spent_today / max(budget_total, 1)
        if actual_spent_ratio > expected_spent_ratio * 1.1:
            return 0.8
        elif actual_spent_ratio < expected_spent_ratio * 0.9:
            return 1.2
        return 1.0

    def _estimate_win_prob(self, bid, floor):
        if bid < floor:
            return 0.0
        margin = (bid - floor) / max(floor, 0.01)
        return min(0.95, 0.3 + margin * 0.5)


class BudgetPacingController:
    """Manage budget spend smoothness"""
    def throttle_bid_rate(self, campaign_stats, current_qps):
        budget = campaign_stats.get('daily_budget', 1000)
        spent = campaign_stats.get('spent', 0)
        hours = campaign_stats.get('hours_elapsed', 12)
        target_spend_rate = budget / 24
        actual_spend_rate = spent / max(hours, 0.1)
        if actual_spend_rate > target_spend_rate * 1.2:
            throttle = target_spend_rate / actual_spend_rate
            return float(np.clip(throttle, 0.1, 1.0))
        return 1.0

    def compute_optimal_frequency_cap(self, user_stats, campaign_config):
        base_cap = campaign_config.get('frequency_cap', {'hour': 2, 'day': 5, 'week': 15})
        if user_stats.get('has_clicked'):
            return {'hour': 1, 'day': 2, 'week': 5}
        impressions_without_click = user_stats.get('impressions_no_click', 0)
        if impressions_without_click > 20:
            return {'hour': 0, 'day': 1, 'week': 3}
        return base_cap


class AuctionOptimizer:
    """Optimize bidding strategy in first and second price auctions"""
    def optimal_bid_second_price(self, valuation, bid_landscape):
        return valuation

    def bid_shading_first_price(self, valuation, historical_clearing_prices):
        if len(historical_clearing_prices) == 0:
            return valuation * 0.8
        best_bid = valuation * 0.5
        best_profit = -float('inf')
        for bid_pct in np.arange(0.5, 1.0, 0.05):
            bid = valuation * bid_pct
            win_prob = (historical_clearing_prices < bid).mean()
            expected_profit = win_prob * (valuation - bid)
            if expected_profit > best_profit:
                best_profit = expected_profit
                best_bid = bid
        return round(best_bid, 4)

    def evaluate_campaign_performance(self, impressions):
        return {
            'impressions': len(impressions),
            'clicks': impressions['clicked'].sum(),
            'conversions': impressions['converted'].sum(),
            'spend_usd': impressions['bid_price'].sum(),
            'ctr': impressions['clicked'].mean(),
            'cvr': impressions['converted'].sum() / max(impressions['clicked'].sum(), 1),
            'cpa_usd': impressions['bid_price'].sum() / max(impressions['converted'].sum(), 1),
            'roas': impressions.get('revenue', pd.Series([0])).sum() / max(impressions['bid_price'].sum(), 1),
            'effective_cpm': impressions['bid_price'].mean() * 1000,
        }

Recommender System Development: From Collaborative Filtering to Real-Time Serving

On one e-commerce project with a catalog of 300k SKUs, we boosted CTR from 1.8% to 4.4% — a 2.4x increase. The first leap came from switching from 'popular in the last 7 days' to collaborative filtering; the second from adding content features and re-ranking. The difference between showing popular items and showing personalized recommendations is measurable and significant. Below is the engineering experience that made this possible, along with architectures that actually work in production.

Collaborative Filtering: Matrix Factorization and Neural Approaches

Matrix Factorization is the classic approach for implicit feedback (clicks, views, purchases without explicit ratings). ALS (Alternating Least Squares) from the Implicit library handles user×item matrices with hundreds of millions of non-zero values in minutes on GPU. Latent factors 64–256, regularization λ=0.01–0.1 are starting parameters. Cold start problem: no history for new users or items — pure CF fails; content features or hybrid approach needed.

Neural Collaborative Filtering (NCF) replaces the dot product with a neural network. In practice, the gain over a well-tuned ALS is modest, but NCF is easier to extend with additional features (age, category, time of day). Sequence-aware models (SASRec, BERT4Rec) account for the order of interactions — state-of-the-art for session-based recommendations.

How to Choose Recommender System Architecture?

The answer depends on data, load, and cold start requirements. Below are three main approaches with selection criteria.

Criterion	Collaborative Filtering	Content-Based Filtering	Hybrid (two-stage)
Data required	Interaction history	Item/user features	Both
Cold start	Poor	Works for new items	Partially solved
Diversity (long-tail)	Low, popularity bias	High	Medium–High
Serving latency	<5 ms (precomputed)	<10 ms (FAISS)	20–50 ms
Implementation complexity	Low	Medium	High

Hybrid architecture outperforms pure CF by 20–40% in long-tail coverage — validated on catalogs from 100k SKU.

Content-Based Filtering: When Interaction History is Scarce

Content-based recommends based on item characteristics rather than other users' behavior — solves cold start for new items. Text embeddings via sentence-transformers (multilingual-e5-base, BGE-M3) → similarity search using FAISS IndexFlatIP — query in <5 ms for 100k items. Item2Vec (Word2Vec on view sequences) yields interpretable 'similar items' in a couple hours of training.

Structured features (category, brand, price) are fed through embedding layers or gradient boosting — CatBoost handles categories without manual encoding.

Why Hybrid Models Work Better?

Production systems are almost always two-level. Stage 1 (Retrieval) — fast selection of 100–500 candidates from 300k items using ALS or Two-Tower model with vector search (FAISS, Qdrant). Stage 2 (Ranking) — heavy ranker on LightGBM or neural network with cross-features, time, device, and session context. LightFM is a good starting point for medium scale without heavy infrastructure. Our practice shows: moving from single-stage to two-stage yields a 15–25% accuracy improvement with only 20–30 ms additional latency.

Real-Time Serving: Architecture Under Load

Latency SLA — 50–100 ms at thousands of requests per second. Base recommendations precomputed (batch job hourly) → Redis by user_id → <5 ms. Real-time re-ranking via Kafka for events (clicks, cart adds) → update of context features. Feature serving — Redis with TTL (views in 24 hours, last clicked item). At 10k req/s, we deploy Redis Cluster with replication.

A/B testing is the only reliable way to measure improvements. Offline metrics do not always correlate with online. Kohavi et al., 'Online Controlled Experiments at Large Scale' (KDD 2013) — a must-read for the team. Test on 5–10% of traffic, monitor CTR, conversion, revenue per session. One of our client systems after hybridization increased revenue by 18% over a month of A/B.

Recommender System Development Timeline

The stages and typical time frames are in the table below. Costs are calculated individually based on catalog scale and latency requirements.

Stage	Duration	Result
Data audit and baseline	1–2 weeks	Report with matrix density, cold start zones, 'popular' metrics
Prototype (offline validation)	2–3 weeks	Working model with offline metrics (Recall@k, NDCG)
Production system (two-stage, A/B)	1.5–2.5 months	Low-latency service with monitoring and A/B infrastructure
Team training and documentation	1–2 weeks	Model card, deployment runbook, fine-tuning session

What's Included in Turnkey Development

Data audit — user×item matrix density (typically <0.1%), activity distribution, temporal patterns, cold start statistics.
Baseline — 'popular' as a simple threshold that is often hard to beat.
Iterative improvement — ALS → content features → two-stage → sequence-aware. Each step with A/B.
Serving infrastructure — batch precomputation, Redis, real-time re-ranking, Grafana monitoring.
Documentation — model card with metrics, deployment instructions, feature descriptions.
Team training — session on interpreting results and model fine-tuning.
Support — 1 month post-launch (incident fixes, pipeline tuning).

We are a team with 7+ years of experience in recommender systems, having delivered over 30 projects for e-commerce and media. We guarantee transparent A/B testing and documented metric improvements.

Want to assess the growth potential of your catalog? Contact us for a free data audit. Order recommender system development — first prototype within two weeks.

Example ALS config for implicit feedback

from implicit.als import AlternatingLeastSquares

model = AlternatingLeastSquares(
    factors=64,
    regularization=0.05,
    iterations=15,
    use_gpu=True
)
model.fit(user_item_matrix)

More about the mathematics of recommender systems — in specialized literature.