Canary Deployment Setup for ML Models

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Canary Deployment Setup for ML Models
Medium
~3-5 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1214
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823

Setting up Canary Deployment for ML models

Canary deployment is a strategy for gradually rolling out a new version of a model: first to a small percentage of traffic (5-10%), then, if there are no issues, to a larger audience. This reduces the risk of degradation for all users simultaneously and allows for rapid rollback.

When Canary is preferable to Blue-Green

Blue-green switches all traffic at once—suitable for services with high confidence in the new version. Canary is needed when:

  • The model is trained on new data, but user reactions are unpredictable.
  • The model architecture has changed (different type, different input features)
  • Critical production service with high cost of failure
  • There is no complete set of integration tests

Implementation on Kubernetes with KServe

KServe (formerly KFServing) supports canary out of the box:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: fraud-detector
spec:
  predictor:
    canaryTrafficPercent: 10  # 10% на новую версию
    model:
      modelFormat:
        name: sklearn
      storageUri: s3://models/fraud-detector-v2/
    # Предыдущая версия - canary baseline

Traffic switching without downtime:

# Увеличение с 10% до 50%
kubectl patch inferenceservice fraud-detector \
  --type='json' \
  -p='[{"op": "replace", "path": "/spec/predictor/canaryTrafficPercent", "value": 50}]'

# Продвижение канарейки в production (100%)
kubectl patch inferenceservice fraud-detector \
  --type='json' \
  -p='[{"op": "remove", "path": "/spec/predictor/canaryTrafficPercent"}]'

Implementation on Seldon Core

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: fraud-detector
spec:
  predictors:
    - name: main
      replicas: 3
      traffic: 90
      graph:
        name: fraud-v1
        implementation: SKLEARN_SERVER
        modelUri: s3://models/fraud-v1
    - name: canary
      replicas: 1
      traffic: 10
      graph:
        name: fraud-v2
        implementation: SKLEARN_SERVER
        modelUri: s3://models/fraud-v2

Automatic traffic management

Progressive traffic increase automatically based on metrics:

def progressive_canary_rollout(service_name, metrics_client):
    stages = [5, 10, 25, 50, 100]

    for target_traffic in stages:
        set_canary_traffic(service_name, target_traffic)
        time.sleep(300)  # 5 минут стабилизации

        metrics = metrics_client.get_metrics(window='5m')

        # Проверка guardrail метрик
        if metrics['canary_error_rate'] > 0.01:
            rollback_canary(service_name)
            alert(f"Canary rollback: error rate {metrics['canary_error_rate']:.2%}")
            return False

        if metrics['canary_p99_latency_ms'] > 500:
            rollback_canary(service_name)
            alert("Canary rollback: latency SLA violated")
            return False

        if metrics['business_metric_delta'] < -0.02:  # -2% деградация
            rollback_canary(service_name)
            alert("Canary rollback: business metric degraded")
            return False

    return True  # Успешный полный деплой

Metrics for promotion decisions

Metrics Promotion conditions Rollback condition
Error rate < 0.5% > 1%
p99 latency < 200ms > 500ms
Prediction drift PSI < 0.1 PSI > 0.2
Business proxy No degradation > 1% Degradation > 3%

Integration with Argo Rollouts

Argo Rollouts is a Kubernetes controller with canary and blue-green support for any workload, not just ML:

spec:
  strategy:
    canary:
      steps:
        - setWeight: 5
        - pause: {duration: 5m}
        - setWeight: 25
        - pause: {duration: 10m}
        - setWeight: 50
        - pause: {duration: 10m}
        - analysis:
            templates:
              - templateName: ml-model-metrics

Canary deployment as a practice reduces MTTR (mean time to recovery) when a model degrades: instead of a full rollback and replay, it is enough to reduce canary traffic to zero.