Setting up a Model Registry to manage model versions
Model Registry is a centralized repository of ML model metadata: versions, artifacts, metrics, and lifecycle stages. Without it, teams store models as files on shared drives, lose track of the code version and model weights, and don't know what's currently running in production.
What does Model Registry provide?
- A single catalog of all trained models with metrics and parameters
- Stage management:
Staging → Production → Archived - History of transitions between versions with the author and reason indicated
- API for programmatically promoting the model into production
- Integration with CI/CD for automatic deployment when changing stages
MLflow Model Registry
MLflow Registry is the most popular open-source option. It is deployed on top of the existing MLflow Tracking Server.
Model registration after training:
import mlflow
with mlflow.start_run():
# ... обучение ...
mlflow.sklearn.log_model(
model,
artifact_path="model",
registered_model_name="fraud-detector-v2"
)
Stage management via API:
client = mlflow.MlflowClient()
client.transition_model_version_stage(
name="fraud-detector-v2",
version=3,
stage="Production",
archive_existing_versions=True
)
Loading the production model into the inference service:
model = mlflow.pyfunc.load_model(
model_uri="models:/fraud-detector-v2/Production"
)
Alternatives and enterprise options
Weights & Biases (W&B) Artifacts is convenient for teams already using W&B to track experiments. It supports lineage between datasets and models.
Vertex AI Model Registry is a managed GCP service with integration into Vertex AI Pipelines and Vertex AI Endpoints.
SageMaker Model Registry - an AWS equivalent, tightly integrated with SageMaker Pipelines and Code Pipeline.
Hugging Face Hub is the de facto standard for LLM and transformers, supporting private repositories and teams.
Implementation process in 1 week
Day 1-2: Deploying MLflow with a PostgreSQL backend and S3 artifact store. Configuring authentication.
Day 3: Adding mlflow.log_model() and mlflow.register_model() to existing training scripts.
Day 4: Set up an approval workflow - a webhook or GitHub Action that requires manual approval before promoting to Production.
Day 5: Integration into the inference service – loading models by stage, not by file path. Setting up alerts for version changes in Production.
Key practices
Each model version should contain: the dataset hash (via DVC), the code version (git commit), metrics for the validation and test sets, and hardware information (GPU type and quantity). This ensures full reproducibility and simplifies debugging during model degradation in production.







