Setting up Canary Deployment for ML models
Canary deployment is a strategy for gradually rolling out a new version of a model: first to a small percentage of traffic (5-10%), then, if there are no issues, to a larger audience. This reduces the risk of degradation for all users simultaneously and allows for rapid rollback.
When Canary is preferable to Blue-Green
Blue-green switches all traffic at once—suitable for services with high confidence in the new version. Canary is needed when:
- The model is trained on new data, but user reactions are unpredictable.
- The model architecture has changed (different type, different input features)
- Critical production service with high cost of failure
- There is no complete set of integration tests
Implementation on Kubernetes with KServe
KServe (formerly KFServing) supports canary out of the box:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: fraud-detector
spec:
predictor:
canaryTrafficPercent: 10 # 10% на новую версию
model:
modelFormat:
name: sklearn
storageUri: s3://models/fraud-detector-v2/
# Предыдущая версия - canary baseline
Traffic switching without downtime:
# Увеличение с 10% до 50%
kubectl patch inferenceservice fraud-detector \
--type='json' \
-p='[{"op": "replace", "path": "/spec/predictor/canaryTrafficPercent", "value": 50}]'
# Продвижение канарейки в production (100%)
kubectl patch inferenceservice fraud-detector \
--type='json' \
-p='[{"op": "remove", "path": "/spec/predictor/canaryTrafficPercent"}]'
Implementation on Seldon Core
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: fraud-detector
spec:
predictors:
- name: main
replicas: 3
traffic: 90
graph:
name: fraud-v1
implementation: SKLEARN_SERVER
modelUri: s3://models/fraud-v1
- name: canary
replicas: 1
traffic: 10
graph:
name: fraud-v2
implementation: SKLEARN_SERVER
modelUri: s3://models/fraud-v2
Automatic traffic management
Progressive traffic increase automatically based on metrics:
def progressive_canary_rollout(service_name, metrics_client):
stages = [5, 10, 25, 50, 100]
for target_traffic in stages:
set_canary_traffic(service_name, target_traffic)
time.sleep(300) # 5 минут стабилизации
metrics = metrics_client.get_metrics(window='5m')
# Проверка guardrail метрик
if metrics['canary_error_rate'] > 0.01:
rollback_canary(service_name)
alert(f"Canary rollback: error rate {metrics['canary_error_rate']:.2%}")
return False
if metrics['canary_p99_latency_ms'] > 500:
rollback_canary(service_name)
alert("Canary rollback: latency SLA violated")
return False
if metrics['business_metric_delta'] < -0.02: # -2% деградация
rollback_canary(service_name)
alert("Canary rollback: business metric degraded")
return False
return True # Успешный полный деплой
Metrics for promotion decisions
| Metrics | Promotion conditions | Rollback condition |
|---|---|---|
| Error rate | < 0.5% | > 1% |
| p99 latency | < 200ms | > 500ms |
| Prediction drift | PSI < 0.1 | PSI > 0.2 |
| Business proxy | No degradation > 1% | Degradation > 3% |
Integration with Argo Rollouts
Argo Rollouts is a Kubernetes controller with canary and blue-green support for any workload, not just ML:
spec:
strategy:
canary:
steps:
- setWeight: 5
- pause: {duration: 5m}
- setWeight: 25
- pause: {duration: 10m}
- setWeight: 50
- pause: {duration: 10m}
- analysis:
templates:
- templateName: ml-model-metrics
Canary deployment as a practice reduces MTTR (mean time to recovery) when a model degrades: instead of a full rollback and replay, it is enough to reduce canary traffic to zero.







