ML Model Drift Monitoring Setup (Data Drift, Concept Drift)
Degradation of ML models in production is rarely instantaneous—it happens gradually, and without specialized monitoring, teams notice the problem too late: when business metrics have already dropped. Drift monitoring allows detection of changes in data or model behavior at an early stage.
Types of Drift
Data drift (covariate shift) — change in the distribution of input features. The model sees data that differs from what it was trained on. Example: seasonal changes in purchasing behavior alter the distribution of the "average time between purchases" feature.
Concept drift — change in the relationship between features and the target variable. Example: fraud patterns change, and features that previously reliably predicted fraud lose predictive power.
Label drift — change in the distribution of the target variable. Example: the proportion of positive examples in a binary classification task changes significantly.
Prediction drift — change in the distribution of model predictions. Can be monitored without labeled data.
Statistical Tests for Drift Detection
| Test | Application | Threshold |
|---|---|---|
| Kolmogorov-Smirnov | Continuous features | p-value < 0.05 |
| Chi-squared | Categorical features | p-value < 0.05 |
| PSI (Population Stability Index) | Binary/categorical | PSI > 0.2 — strong drift |
| Jensen-Shannon Divergence | Any distributions | JS > 0.1 |
| Maximum Mean Discrepancy | Multivariate drift | Kernel-dependent |
Monitoring Tools
Evidently AI — open-source library for generating drift reports:
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, ModelQualityPreset
report = Report(metrics=[
DataDriftPreset(),
ModelQualityPreset(),
])
report.run(
reference_data=training_data,
current_data=production_data_last_week
)
report.save_html("drift_report.html")
Whylogs / WhyLabs — lightweight library for logging statistical data profiles in real-time. Minimal overhead on production inference.
Arize AI, Fiddler, Arthur — commercial platforms with ready-made dashboards, alerts, and production data labeling capabilities.
Grafana + Prometheus — custom monitoring where drift metrics are exported as Prometheus metrics.
Monitoring Without Ground Truth
The classic problem: in production, ground truth (correct answer) appears with delay or not at all. Without labeled data, you can monitor:
- Prediction drift — change in prediction distribution
- Feature drift — change in input feature distribution
- Confidence distribution — change in model confidence
- Business proxy metrics — e.g., CTR as proxy for recommendation quality
Alert Setup
# Integration with Grafana Alerting
def compute_psi(expected, actual, buckets=10):
expected_hist, _ = np.histogram(expected, bins=buckets, density=True)
actual_hist, _ = np.histogram(actual, bins=buckets, density=True)
# Smoothing to avoid division by zero
expected_hist = np.where(expected_hist == 0, 1e-6, expected_hist)
actual_hist = np.where(actual_hist == 0, 1e-6, actual_hist)
psi = np.sum((actual_hist - expected_hist) * np.log(actual_hist / expected_hist))
return psi
# Export to Prometheus
psi_value = compute_psi(reference_feature, production_feature)
prometheus_client.Gauge('model_feature_psi', 'PSI for feature X').set(psi_value)
Alerts are configured in Grafana: PSI > 0.2 — warning, PSI > 0.25 — critical with Slack/PagerDuty notification.
Response Process
When drift is detected: 1) analyze data changes, 2) decide on retraining or feature engineering fix, 3) if concept drift — may need model architecture redesign. Monitoring without a response process is useless — important to describe a runbook for each alert type in advance.







