Setting up automatic model retraining (Model Retraining)
A model trained once inevitably degrades: data changes, user behavior evolves, and new patterns emerge. Automatic retraining is a system that monitors the quality of the model and initiates a training cycle when degradation is detected or on a scheduled basis.
Retraining triggers
There are two approaches: schedule-based and trigger-based.
Schedule-based retraining is performed on a scheduled basis (daily or weekly), regardless of model quality. It's easy to implement, predictable, and suitable for rapidly changing domains (news recommendations, dynamic pricing).
Trigger-based — retraining upon detection of drift or degradation of metrics:
- Data drift: the distribution of input data has changed (KS test, PSI > 0.2)
- Performance drift: metrics on labeled data dropped below threshold
- Concept drift: the relationship between features and targets has changed
In practice, a combination is used: soft drift triggers + a hard schedule as a fallback.
Architecture of the retraining system
[Monitoring] → [Drift Detected / Schedule] → [Data Collection]
→ [Data Validation] → [Training Job] → [Evaluation]
→ [A/B Test / Canary] → [Promotion] → [Monitoring]
Orchestrator: Airflow, Prefect, Kubeflow Pipelines, Vertex AI Pipelines.
Airflow DAG Example:
from airflow import DAG
from airflow.operators.python import PythonOperator
dag = DAG(
'model_retraining',
schedule_interval='@weekly',
catchup=False
)
check_drift = PythonOperator(
task_id='check_data_drift',
python_callable=run_drift_detection,
dag=dag
)
collect_data = PythonOperator(
task_id='collect_training_data',
python_callable=prepare_dataset,
dag=dag
)
train = PythonOperator(
task_id='train_model',
python_callable=run_training,
dag=dag
)
check_drift >> collect_data >> train
Training data management
The key question is: what data should be included in retraining? Options:
- Full retrain: all historical data. Stable, but expensive in terms of time and computation.
- Rolling window: only data for the last N days/weeks. The model forgets history but adapts better to current patterns.
- Incremental learning: retraining on new data without retraining from scratch. Not suitable for all algorithms.
- Weighted samples: older data with less weight. Balance between stability and adaptation.
Validation gate before promotion
An automatically retrained model should not be released into production without validation:
def validate_new_model(new_model, current_model, test_dataset):
new_metrics = evaluate(new_model, test_dataset)
current_metrics = evaluate(current_model, test_dataset)
# Новая модель должна быть лучше текущей
if new_metrics['auc'] < current_metrics['auc'] * 0.99:
raise ValueError(f"New model AUC {new_metrics['auc']:.4f} "
f"worse than current {current_metrics['auc']:.4f}")
# Проверка latency
if new_metrics['p95_latency_ms'] > 100:
raise ValueError("Inference too slow")
return True
Managing experiments in auto-retraining
Each retraining cycle is logged in MLflow, recording the data version (DVC hash), hyperparameters, metrics, and training time. This allows for retrospective analysis of degradation and pinpointing the moment when the model began to deteriorate.
A typical result: the team moves from manual retraining "when remembered" (every 2-3 months) to an automated cycle with weekly updates and quality metrics that are always up-to-date.







