Video Object Tracking System Development
Tracking is the task of following a specific object through a sequence of frames while preserving identity. If detection answers "what and where in the frame", then tracking adds "this is the same object as in previous frames". Applications: counting people crossing a line, analyzing customer trajectories in stores, autonomous driving control, sports analytics.
Tracking Algorithms
SORT (Simple Online and Realtime Tracking) — basic algorithm: Kalman filter for position prediction + IoU matching for association. Fast, but loses objects in occlusions.
DeepSORT — SORT + ReID (Re-Identification): deep appearance features for association by visual appearance, not just spatial position. Better handles occlusions.
ByteTrack — current state-of-the-art for general tasks. Uses all detections (including low-confidence) for association:
from ultralytics import YOLO
model = YOLO('yolov8l.pt')
# Tracking built into Ultralytics
results = model.track(
source='video.mp4',
tracker='bytetrack.yaml',
persist=True, # preserve track-IDs between frames
conf=0.3,
iou=0.5
)
for result in results:
boxes = result.boxes
for box in boxes:
track_id = box.id.item() # unique object ID
x1, y1, x2, y2 = box.xyxy[0]
BoT-SORT — ByteTrack + camera motion compensation + ReID. Best results on MOT17 benchmark: HOTA 77.8.
StrongSORT — more aggressive ReID integration, better for long occlusion tasks.
ReID Models
ReID model extracts appearance embedding of an object. When tracking is lost, system searches for it by similarity in embeddings:
import torchreid
# Load ReID model
model = torchreid.models.build_model(
name='osnet_x1_0',
num_classes=751, # Market-1501
pretrained=True
)
def extract_appearance_features(crop: np.ndarray) -> np.ndarray:
tensor = preprocess_crop(crop)
with torch.no_grad():
features = model(tensor)
return features.cpu().numpy()
ReID metric: mAP and Rank-1 on Market-1501 / DukeMTMC. OSNet-x1.0: Rank-1 94.8%, mAP 84.9% on Market-1501.
Trajectory Analysis
After tracking, build trajectory analytics:
class TrajectoryAnalyzer:
def __init__(self):
self.tracks = {} # track_id -> list of (frame, x, y)
def update(self, track_id, frame_num, cx, cy):
if track_id not in self.tracks:
self.tracks[track_id] = []
self.tracks[track_id].append((frame_num, cx, cy))
def count_line_crossings(self, line: tuple, direction='both') -> int:
"""Count virtual line crossings"""
count = 0
for track in self.tracks.values():
if self._crosses_line(track, line, direction):
count += 1
return count
Tracking Quality Metrics
- HOTA (Higher Order Tracking Accuracy) — primary metric, balances Detection and Association accuracy
- MOTA (Multiple Object Tracking Accuracy) — accounts for FP, FN, ID switches
- IDF1 — ID F1 score: how well IDs are preserved over time
- ID Switches — count of ID changes for one object
| Algorithm | HOTA MOT17 | MOTA | ID Switches |
|---|---|---|---|
| SORT | 55.1 | 63.3 | 4852 |
| DeepSORT | 61.2 | 71.4 | 1821 |
| ByteTrack | 77.3 | 80.3 | 2196 |
| BoT-SORT | 77.8 | 80.5 | 1871 |
| System Scale | Timeline |
|---|---|
| Tracking 1 class, 1–4 cameras | 2–3 weeks |
| Multiclass, trajectory analysis | 4–6 weeks |
| Long-term ReID tracking (re-enter) | 6–10 weeks |







