CVAT Image and Video Labeling Integration

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
CVAT Image and Video Labeling Integration
Medium
~2-3 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Integration and configuration of CVAT for image and video tagging

CVAT (Computer Vision Annotation Tool) is an open-source data annotation platform from Intel, the de facto standard for teams that don't want to pay $2–5 per image to third-party services. But "installing CVAT" and "setting up an effective annotation pipeline" are two different tasks.

CVAT Deployment: Production Configuration

# docker-compose.override.yml
version: '3.3'

services:
  cvat_server:
    environment:
      DJANGO_MODWSGI_EXTRA_ARGS: ""
      ALLOWED_HOSTS: "*"
      CVAT_REDIS_HOST: "cvat_redis"
      CVAT_POSTGRES_HOST: "cvat_db"
      # Хранилище в S3 вместо локального
      CVAT_DEFAULT_STORAGE_TYPE: "cloud_storage"
      AWS_ACCESS_KEY_ID: "${AWS_ACCESS_KEY_ID}"
      AWS_SECRET_ACCESS_KEY: "${AWS_SECRET_ACCESS_KEY}"
      AWS_STORAGE_BUCKET_NAME: "cvat-data"

  cvat_worker_annotation:
    deploy:
      replicas: 4  # параллельные воркеры для AI-ассистированной разметки

  cvat_worker_export:
    deploy:
      replicas: 2

  traefik:
    command:
      - "--providers.docker.exposedByDefault=false"
      - "--entrypoints.websecure.address=:443"
      - "[email protected]"
# Быстрый деплой с SSL
git clone https://github.com/opencv/cvat.git
cd cvat
docker compose -f docker-compose.yml \
               -f docker-compose.override.yml \
               -f components/serverless/docker-compose.serverless.yml up -d

# Создаём superuser
docker exec -it cvat_server python manage.py createsuperuser

AI-assisted annotation: semi-automated tagging

The main reason to use CVAT in 2024 is its integration with Nuclio serverless for automatic labeling. We load the model, it suggests labeling, and humans only make the necessary adjustments.

# nuclio/yolov8_detector/main.py
import json
import base64
import numpy as np
import cv2
from ultralytics import YOLO

model = YOLO('/opt/nuclio/yolov8l.pt')

def handler(context, event):
    """Nuclio function: CVAT вызывает нас для каждого изображения"""
    data = event.body

    buf = base64.b64decode(data['image'])
    img = cv2.imdecode(np.frombuffer(buf, np.uint8), cv2.IMREAD_COLOR)

    threshold = data.get('threshold', 0.45)
    results = model(img, conf=threshold)

    annotations = []
    for box in results[0].boxes:
        x1, y1, x2, y2 = map(float, box.xyxy[0])
        cls_name = model.names[int(box.cls)]

        annotations.append({
            'confidence': float(box.conf),
            'label': cls_name,
            'points': [x1, y1, x2, y2],
            'type': 'rectangle'
        })

    return context.Response(
        body=json.dumps(annotations),
        headers={'Content-Type': 'application/json'},
        status_code=200
    )
# nuclio function.yaml
apiVersion: nuclio.io/v1beta1
kind: Function
metadata:
  name: cvat-yolov8-detector
spec:
  runtime: python:3.9
  handler: main:handler
  resources:
    limits:
      nvidia.com/gpu: 1
  env:
    - name: MODEL_PATH
      value: /opt/nuclio/yolov8l.pt

Automatic data import and export

from cvat_sdk import make_client
from cvat_sdk.models import TaskWriteRequest, DataRequest
import os

class CVATIntegration:
    def __init__(self, host: str, credentials: tuple):
        self.client = make_client(host=host, credentials=credentials)

    def create_task_from_s3(self, task_name: str, s3_prefix: str,
                              labels: list[dict]) -> int:
        """Создаём задачу разметки из S3-бакета"""
        task = self.client.tasks.create(TaskWriteRequest(
            name=task_name,
            labels=labels,
            segment_size=100,  # изображений в одном сегменте
            overlap=5
        ))

        # Загружаем данные из S3
        self.client.tasks.create_data(
            id=task.id,
            data_request=DataRequest(
                cloud_storage_id=1,  # ID настроенного S3 хранилища
                filename=[f'{s3_prefix}/{f}'
                           for f in self._list_s3_files(s3_prefix)]
            )
        )
        return task.id

    def export_annotations(self, task_id: int,
                             format: str = 'YOLO 1.1') -> str:
        """Экспорт в YOLO/COCO/Pascal VOC формат"""
        export_path = f'/tmp/annotations_{task_id}.zip'
        self.client.tasks.export_dataset(
            id=task_id,
            format=format,
            filename=export_path
        )
        return export_path

    def get_annotation_progress(self, task_id: int) -> dict:
        task = self.client.tasks.retrieve(task_id)
        return {
            'total_frames': task.size,
            'annotated': task.jobs[0].stage if task.jobs else 0
        }

AI-assisted vs. manual marking speed

Actual figures from the industrial defect labeling project (5000 images):

Method Time/Image Total 5000 images
Manual marking from scratch 4–7 min 20–35 business days
AI pre-marking + correction (80% accuracy) 45–90 sec 4–8 business days
AI pre-marking + correction (95% accuracy) 15–30 sec 1-2 business days

With low prediction quality (< 70%), the AI assistant slows down the work – the markup artist spends more time on corrections than on markup from scratch.

Markup quality management

  • Overlap jobs: 10–15% of images are labeled by two labelers independently, then we compare IoU
  • Honey pot: specially prepared images with pre-determined markup - we check the quality of a specific markup
  • Consensus annotation: 3 markers for complex cases + majority vote
Type of work Term
Deploy CVAT + Basic Setup 1–2 weeks
CVAT + AI-assisted tagging 3-5 weeks
Full pipeline: CVAT + QA + CI/CD 6–10 weeks