Integration and configuration of CVAT for image and video tagging
CVAT (Computer Vision Annotation Tool) is an open-source data annotation platform from Intel, the de facto standard for teams that don't want to pay $2–5 per image to third-party services. But "installing CVAT" and "setting up an effective annotation pipeline" are two different tasks.
CVAT Deployment: Production Configuration
# docker-compose.override.yml
version: '3.3'
services:
cvat_server:
environment:
DJANGO_MODWSGI_EXTRA_ARGS: ""
ALLOWED_HOSTS: "*"
CVAT_REDIS_HOST: "cvat_redis"
CVAT_POSTGRES_HOST: "cvat_db"
# Хранилище в S3 вместо локального
CVAT_DEFAULT_STORAGE_TYPE: "cloud_storage"
AWS_ACCESS_KEY_ID: "${AWS_ACCESS_KEY_ID}"
AWS_SECRET_ACCESS_KEY: "${AWS_SECRET_ACCESS_KEY}"
AWS_STORAGE_BUCKET_NAME: "cvat-data"
cvat_worker_annotation:
deploy:
replicas: 4 # параллельные воркеры для AI-ассистированной разметки
cvat_worker_export:
deploy:
replicas: 2
traefik:
command:
- "--providers.docker.exposedByDefault=false"
- "--entrypoints.websecure.address=:443"
- "[email protected]"
# Быстрый деплой с SSL
git clone https://github.com/opencv/cvat.git
cd cvat
docker compose -f docker-compose.yml \
-f docker-compose.override.yml \
-f components/serverless/docker-compose.serverless.yml up -d
# Создаём superuser
docker exec -it cvat_server python manage.py createsuperuser
AI-assisted annotation: semi-automated tagging
The main reason to use CVAT in 2024 is its integration with Nuclio serverless for automatic labeling. We load the model, it suggests labeling, and humans only make the necessary adjustments.
# nuclio/yolov8_detector/main.py
import json
import base64
import numpy as np
import cv2
from ultralytics import YOLO
model = YOLO('/opt/nuclio/yolov8l.pt')
def handler(context, event):
"""Nuclio function: CVAT вызывает нас для каждого изображения"""
data = event.body
buf = base64.b64decode(data['image'])
img = cv2.imdecode(np.frombuffer(buf, np.uint8), cv2.IMREAD_COLOR)
threshold = data.get('threshold', 0.45)
results = model(img, conf=threshold)
annotations = []
for box in results[0].boxes:
x1, y1, x2, y2 = map(float, box.xyxy[0])
cls_name = model.names[int(box.cls)]
annotations.append({
'confidence': float(box.conf),
'label': cls_name,
'points': [x1, y1, x2, y2],
'type': 'rectangle'
})
return context.Response(
body=json.dumps(annotations),
headers={'Content-Type': 'application/json'},
status_code=200
)
# nuclio function.yaml
apiVersion: nuclio.io/v1beta1
kind: Function
metadata:
name: cvat-yolov8-detector
spec:
runtime: python:3.9
handler: main:handler
resources:
limits:
nvidia.com/gpu: 1
env:
- name: MODEL_PATH
value: /opt/nuclio/yolov8l.pt
Automatic data import and export
from cvat_sdk import make_client
from cvat_sdk.models import TaskWriteRequest, DataRequest
import os
class CVATIntegration:
def __init__(self, host: str, credentials: tuple):
self.client = make_client(host=host, credentials=credentials)
def create_task_from_s3(self, task_name: str, s3_prefix: str,
labels: list[dict]) -> int:
"""Создаём задачу разметки из S3-бакета"""
task = self.client.tasks.create(TaskWriteRequest(
name=task_name,
labels=labels,
segment_size=100, # изображений в одном сегменте
overlap=5
))
# Загружаем данные из S3
self.client.tasks.create_data(
id=task.id,
data_request=DataRequest(
cloud_storage_id=1, # ID настроенного S3 хранилища
filename=[f'{s3_prefix}/{f}'
for f in self._list_s3_files(s3_prefix)]
)
)
return task.id
def export_annotations(self, task_id: int,
format: str = 'YOLO 1.1') -> str:
"""Экспорт в YOLO/COCO/Pascal VOC формат"""
export_path = f'/tmp/annotations_{task_id}.zip'
self.client.tasks.export_dataset(
id=task_id,
format=format,
filename=export_path
)
return export_path
def get_annotation_progress(self, task_id: int) -> dict:
task = self.client.tasks.retrieve(task_id)
return {
'total_frames': task.size,
'annotated': task.jobs[0].stage if task.jobs else 0
}
AI-assisted vs. manual marking speed
Actual figures from the industrial defect labeling project (5000 images):
| Method | Time/Image | Total 5000 images |
|---|---|---|
| Manual marking from scratch | 4–7 min | 20–35 business days |
| AI pre-marking + correction (80% accuracy) | 45–90 sec | 4–8 business days |
| AI pre-marking + correction (95% accuracy) | 15–30 sec | 1-2 business days |
With low prediction quality (< 70%), the AI assistant slows down the work – the markup artist spends more time on corrections than on markup from scratch.
Markup quality management
- Overlap jobs: 10–15% of images are labeled by two labelers independently, then we compare IoU
- Honey pot: specially prepared images with pre-determined markup - we check the quality of a specific markup
- Consensus annotation: 3 markers for complex cases + majority vote
| Type of work | Term |
|---|---|
| Deploy CVAT + Basic Setup | 1–2 weeks |
| CVAT + AI-assisted tagging | 3-5 weeks |
| Full pipeline: CVAT + QA + CI/CD | 6–10 weeks |







