Segmentation Model Training (SAM, U-Net, Mask R-CNN)

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Offered services

Showing 1 of 1 servicesAll 1566 services

Medium

~5 business days

FAQ

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1212
Development of a web application for FEEDME
1161
Website development for BELFINGROUP
852
Development of an online store for the company FURNORO
1041
B2B Advance company logo design
561
Development of a web application for Enviok
822

Show more works

Training моделей сегментации: SAM, U-Net, SegFormer

Сегментация делится на три задачи с разными архитектурными решениями: semantic segmentation (каждый пиксель → класс), instance segmentation (каждый объект отдельно), panoptic (оба сразу). Selection между U-Net, SegFormer и SAM зависит от задачи, объёма данных и требований к latency.

U-Net и его вариации — медицина и промышленность

U-Net остаётся стандартом для медицинской сегментации не из-за качества (SegFormer лучше), а из-за устойчивости при малом датасете (100–500 изображений) и интерпретируемости.

import torch
import torch.nn as nn
import segmentation_models_pytorch as smp

def build_unet_model(
    architecture: str = 'Unet',        # 'Unet', 'UnetPlusPlus', 'MAnet'
    encoder: str = 'efficientnet-b4',  # backbone
    encoder_weights: str = 'imagenet',
    num_classes: int = 1,              # 1 для бинарной сегментации
    in_channels: int = 3
) -> nn.Module:
    model = getattr(smp, architecture)(
        encoder_name=encoder,
        encoder_weights=encoder_weights,
        in_channels=in_channels,
        classes=num_classes,
        activation=None   # применяем sigmoid/softmax отдельно
    )
    return model

# Loss для медицинской сегментации с малым датасетом
class CombinedLoss(nn.Module):
    def __init__(self, dice_weight: float = 0.5, bce_weight: float = 0.5):
        super().__init__()
        self.dice_weight = dice_weight
        self.bce_weight  = bce_weight
        self.bce = nn.BCEWithLogitsLoss()
        self.dice = smp.losses.DiceLoss(mode='binary', from_logits=True)

    def forward(self, preds: torch.Tensor,
                targets: torch.Tensor) -> torch.Tensor:
        return (self.bce_weight  * self.bce(preds, targets) +
                self.dice_weight * self.dice(preds, targets))

SegFormer — semantic segmentation

SegFormer-B4 на большинстве задач semantic segmentation обходит U-Net при датасете >1000 изображений:

from transformers import SegformerForSemanticSegmentation, SegformerConfig
import torch
import torch.nn.functional as F

def train_segformer(
    num_labels: int,
    id2label: dict,
    label2id: dict,
    pretrained_model: str = 'nvidia/mit-b4',
    learning_rate: float = 6e-5,
    num_epochs: int = 50
) -> SegformerForSemanticSegmentation:

    model = SegformerForSemanticSegmentation.from_pretrained(
        pretrained_model,
        num_labels=num_labels,
        id2label=id2label,
        label2id=label2id,
        ignore_mismatched_sizes=True
    )

    optimizer = torch.optim.AdamW(
        model.parameters(),
        lr=learning_rate,
        weight_decay=0.01
    )

    # Poly LR decay — стандарт для semantic segmentation
    scheduler = torch.optim.lr_scheduler.PolynomialLR(
        optimizer,
        total_iters=num_epochs,
        power=0.9
    )

    return model, optimizer, scheduler

def segformer_inference(
    model: SegformerForSemanticSegmentation,
    pixel_values: torch.Tensor,    # (B, 3, H, W)
    target_size: tuple = None      # (H, W) оригинального изображения
) -> torch.Tensor:
    """
    SegFormer выдаёт логиты в 4x уменьшенном разрешении.
    Нужен билинейный апскейл до оригинального размера.
    """
    outputs = model(pixel_values=pixel_values)
    logits = outputs.logits  # (B, num_labels, H/4, W/4)

    if target_size is not None:
        logits = F.interpolate(
            logits,
            size=target_size,
            mode='bilinear',
            align_corners=False
        )
    return logits

SAM2 fine-tuning для кастомных доменов

SAM2 из коробки не знает специфические классы (микроскопия, промышленные дефекты). Fine-tuning только decoder mask head — эффективный подход:

from sam2.build_sam import build_sam2
import torch

def finetune_sam2_decoder(
    checkpoint_path: str,
    num_epochs: int = 30,
    learning_rate: float = 1e-4,
    freeze_image_encoder: bool = True,   # encoder тяжёлый — замораживаем
    freeze_prompt_encoder: bool = True
) -> torch.nn.Module:

    sam2 = build_sam2(
        'sam2_hiera_large.yaml',
        checkpoint_path,
        device='cuda'
    )

    # Замораживаем всё кроме mask decoder
    for param in sam2.image_encoder.parameters():
        param.requires_grad = not freeze_image_encoder
    for param in sam2.prompt_encoder.parameters():
        param.requires_grad = not freeze_prompt_encoder

    # Только mask decoder обучаем
    for param in sam2.mask_decoder.parameters():
        param.requires_grad = True

    trainable_params = sum(
        p.numel() for p in sam2.parameters() if p.requires_grad
    )
    print(f'Trainable parameters: {trainable_params:,}')
    # Для SAM2-Large с frozen encoder: ~4M из 224M

    optimizer = torch.optim.AdamW(
        filter(lambda p: p.requires_grad, sam2.parameters()),
        lr=learning_rate,
        weight_decay=1e-4
    )

    return sam2, optimizer

Metrics сегментации

Метрика	Формула	Когда использовать
mIoU	mean(TP/(TP+FP+FN)) по классам	Semantic segmentation
Dice	2TP/(2TP+FP+FN)	Медицинская, дисбаланс
Boundary IoU	IoU только на границах	Точность контуров
PQ (Panoptic Quality)	SQ × RQ	Panoptic segmentation

Сравнение архитектур

Модель	mIoU ADE20K	Latency (640px)	VRAM обучение	Малый датасет
U-Net (EfficientB4)	42.1	8ms	6GB	Отлично
SegFormer-B2	46.5	15ms	8GB	Хорошо
SegFormer-B4	50.3	28ms	12GB	Хорошо
Mask2Former	56.1	45ms	16GB	Плохо
SAM2 (finetuned)	—	60ms	20GB	Отлично

Сроки

Задача	Срок
Fine-tuning U-Net (готовые данные)	2–3 недели
SAM2 fine-tuning под домен	3–5 недель
Полная система semantic segmentation	5–9 недель