Object Detection System Development
Object detection is the task of simultaneous localization (bounding box) and classification of objects in an image. A single model in one pass produces: box coordinates, object class, confidence score. Applications include product counting on shelves, defect detection on conveyor lines, vehicle recognition, and human detection in video.
Detector Selection
YOLOv8/YOLO11 — optimal choice for most tasks. Ultralytics implementation with good documentation, active support, and built-in export to TensorRT/ONNX.
RT-DETR (Real-Time Detection Transformer) — transformer-based detector, better quality at comparable speed to YOLOv8.
Grounding DINO — open-vocabulary detection: finds objects by text description without retraining. Useful for prototyping and rare category tasks.
| Model | [email protected] COCO | FPS (T4) | Parameters |
|---|---|---|---|
| YOLOv8n | 52.9 | 320 | 3.2M |
| YOLOv8l | 64.9 | 87 | 43.7M |
| YOLO11m | 64.0 | 183 | 20.1M |
| RT-DETR-L | 65.6 | 74 | 32M |
Fine-tuning for Custom Classes
from ultralytics import YOLO
# Load pretrained model
model = YOLO('yolov8l.pt')
# Train on custom dataset
results = model.train(
data='dataset.yaml', # dataset config path
epochs=100,
imgsz=640,
batch=16,
optimizer='AdamW',
lr0=0.001,
lrf=0.01, # final LR = lr0 * lrf
weight_decay=0.0005,
augment=True,
degrees=10.0, # rotation augmentation
mosaic=1.0, # mosaic augmentation
device=0
)
dataset.yaml structure:
path: /data/myproject
train: images/train
val: images/val
test: images/test
nc: 5 # number of classes
names: ['cat', 'dog', 'car', 'person', 'bicycle']
Detection-Specific Augmentation
Detection requires specific augmentation — transformations must correctly apply to bounding boxes:
- Mosaic — combining 4 images into one, increases context diversity
- MixUp — blending two images with weights
- Copy-Paste — cutting objects and pasting in new context
- Random crop preserving objects in frame
- Albumentations:
HorizontalFlip,RandomBrightnessContrast,GaussNoise
Detection Metrics
- [email protected] — mean Average Precision at IoU threshold 0.5
- [email protected]:0.95 — stricter: average mAP at IoU from 0.5 to 0.95 with 0.05 step
- Precision / Recall at specific confidence threshold
- FPS / latency — for real-time systems
Confidence threshold selection: ROC-like precision-recall curve, threshold choice depends on acceptable precision/recall balance for specific application.
NMS and Post-processing
Non-Maximum Suppression removes duplicate detections. Parameters: IoU threshold (0.45–0.7), confidence threshold (0.25–0.5). For densely located objects — Soft-NMS or Class-Agnostic NMS.
Deployment
TensorRT engine for NVIDIA GPU: export from Ultralytics with one command. ONNX for CPU deployment. For Raspberry Pi / Jetson: YOLO11n in TFLite / ONNX.
| Task | Timeline |
|---|---|
| Detection of 1–5 classes, sufficient data | 1–3 weeks |
| Detection of 20+ classes, data collection | 4–7 weeks |
| Detection in challenging conditions (night, fog) | 6–10 weeks |







