Shadow Deployment Setup for ML Models

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Shadow Deployment Setup for ML Models
Medium
~3-5 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1243
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1170
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    873
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1086
  • image_logo-advance_0.png
    B2B Advance company logo design
    563
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    830

Setting up Shadow Deployment for ML models

Shadow deployment (mirror deployment) is a strategy in which a new version of a model receives the same requests as production, but its responses are not delivered to users. The goal is to test the new model's behavior on real traffic without any risk to users.

When to use shadow deployment

  • A radical change in the model architecture (for example, a transition from gradient boosting to a neural network)
  • The new version hasn't been fully tested yet, but we need to see some real data.
  • Checking latency and resource utilization under real load
  • Validation of the data processing pipeline before the new version
  • Testing a large LLM model before replacing it with a smaller one

Architecture

[User Request]
      |
      ├──→ [Production Model V1] ──→ [Response to User]
      |
      └──→ [Shadow Model V2] ──→ [Prediction logged, not returned]
                                         |
                                   [Comparison DB]
                                         |
                                 [Metrics Dashboard]

Implementation with Envoy/Istio

Istio mirror:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ml-inference
spec:
  hosts:
    - ml-inference
  http:
    - route:
        - destination:
            host: ml-inference
            subset: v1
          weight: 100
      mirror:
        host: ml-inference
        subset: v2-shadow
      mirrorPercentage:
        value: 100  # Зеркалировать 100% трафика

Nginx mirror:

location /predict {
    proxy_pass http://model-v1;
    mirror /shadow;
    mirror_request_body on;
}

location = /shadow {
    internal;
    proxy_pass http://model-v2-shadow/predict;
}

Application-level implementation

For more flexible logging and comparison, here's the code implementation:

import asyncio
import logging

async def predict_with_shadow(request_features):
    # Production модель — синхронно
    production_result = production_model.predict(request_features)

    # Shadow модель — асинхронно, не блокирует ответ
    asyncio.create_task(
        run_shadow_prediction(request_features, production_result)
    )

    return production_result

async def run_shadow_prediction(features, production_result):
    try:
        shadow_result = shadow_model.predict(features)

        # Логирование для сравнения
        comparison_store.log({
            'timestamp': datetime.utcnow(),
            'production_score': float(production_result),
            'shadow_score': float(shadow_result),
            'agreement': abs(production_result - shadow_result) < 0.1,
            'features_hash': hash_features(features)
        })
    except Exception as e:
        logging.error(f"Shadow prediction failed: {e}")
        # Ошибка в shadow не влияет на production

Comparison metrics

Agreement rate — the percentage of queries where the model predictions match (within the specified tolerance):

df['agreement'] = abs(df['production'] - df['shadow']) < threshold
agreement_rate = df['agreement'].mean()
# Цель: > 95% agreement для критичных систем

Prediction distribution comparison:

from scipy.stats import ks_2samp

ks_stat, p_value = ks_2samp(df['production'], df['shadow'])
# Если p_value < 0.05 — распределения значимо отличаются

Latency comparison: The shadow model may be slower without impacting users, but it indicates future latency issues during the transition.

When to switch from Shadow to Canary

Recommendations for the transition:

  • Shadow testing was completed for at least 1 week on real traffic
  • Agreement rate > 95% (or agreed business decision on acceptable discrepancy)
  • Latency of the shadow model < SLA (even though it is not yet critical)
  • Resource utilization is normal at peak load
  • No unexpected errors in the shadow service logs

Shadow deployment is the safest testing strategy, especially for systems where the cost of error is high: financial decisions, medical diagnostics, security systems.