CI/CD Setup for ML Models (Auto Training and Deployment)

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
CI/CD Setup for ML Models (Auto Training and Deployment)
Complex
~5 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1243
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1170
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    873
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1086
  • image_logo-advance_0.png
    B2B Advance company logo design
    563
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    830

Setting up CI/CD for ML models: automatic training and deployment

CI/CD for ML is fundamentally different from classic software CI/CD: it requires testing not only code but also data, model metrics, and inference performance. A full-fledged ML pipeline includes automatic retraining, model quality validation, promotion to staging/production, and rollback in case of degradation.

ML CI/CD Architecture

[Git Push] → [Data Validation] → [Model Training] → [Model Evaluation]
     → [Model Registry] → [Staging Deploy] → [Integration Tests]
     → [Canary Deploy] → [Production Promote] → [Monitoring]

Each stage is a separate task in the pipeline with clear success/failure criteria.

Orchestration tools

GitHub Actions / GitLab CI – suitable for small teams. Enough to run training on self-hosted runners with GPUs.

Kubeflow Pipelines is a Kubernetes-native orchestrator for ML. Each step is a separate container. It supports caching of intermediate results, a visual pipeline graph, and versioning.

MLflow Projects + Prefect/Airflow is a less monolithic approach. Prefect or Airflow orchestrates, MLflow tracks.

Vertex AI Pipelines / SageMaker Pipelines — managed options for the corresponding clouds.

Example pipeline on GitHub Actions

name: ML Training Pipeline

on:
  schedule:
    - cron: '0 2 * * 1'  # Еженедельно по понедельникам
  push:
    paths:
      - 'src/train.py'
      - 'params.yaml'

jobs:
  validate-data:
    runs-on: self-hosted
    steps:
      - uses: actions/checkout@v3
      - name: Run Great Expectations
        run: python validate_data.py --suite training_data

  train-model:
    needs: validate-data
    runs-on: [self-hosted, gpu]
    steps:
      - name: Train
        run: python train.py --config params.yaml
      - name: Evaluate
        run: python evaluate.py --threshold 0.92

  promote-to-staging:
    needs: train-model
    runs-on: ubuntu-latest
    steps:
      - name: Register and promote
        run: |
          python scripts/promote_model.py \
            --stage staging \
            --min-f1 0.92

Model testing as part of CI

Data validation (Great Expectations, Pandera): checking the scheme, distributions, and presence of outliers in the training data before starting training.

Model evaluation gates: The model advances to the next stage only if the metrics exceed threshold values. It's important to compare not with an absolute threshold, but with the current production model: the new version should be at least 1-2% better than the existing one.

Inference latency tests: automatic testing of p95 inference latency. If the model has become more accurate but is three times slower, that's not progress.

Shadow testing: The new model runs production traffic in parallel with the current one, and the results are compared without affecting users.

Deployment strategies

Strategy Risk Rollback Use case
Blue-Green Average Instantaneous Small models
Canary (5% → 25% → 100%) Short Fast Critical services
Shadow Minimum Not needed Risk-free testing
Rolling Average Slow Stateless inference

Rollback mechanism

Automatic rollback should be triggered when:

  • Business metrics fell by more than X% (CTR, conversion)
  • The error rate of the inference service has exceeded the threshold.
  • Latency p99 exceeded SLA
# Мониторинг и автооткат
if current_model_metrics['f1'] < production_model_metrics['f1'] * 0.97:
    model_registry.transition_to_stage(current_version, 'Archived')
    model_registry.transition_to_stage(previous_version, 'Production')
    alert_team("Auto-rollback triggered")

Setup times

Basic pipeline (training + deployment to staging): 1 week. Full pipeline with tests, canary deployment, and autorollback: 3-4 weeks. Enterprise version with Kubeflow and full integration into corporate CI/CD: 6-8 weeks.