Development of an AI system for transport and unmanned systems
Autonomous transport is one of the most capital-intensive and technically complex applications of AI. The technology stack encompasses computer vision, trajectory planning, real-time decision-making, and safety systems. Industrial drones (mining trucks, port AGVs, warehouse drones) are already in mass production.
Autonomous Driving Stack
Perception
Sensor array: LiDAR, cameras, radar, ultrasonic. Each sensor has its own strengths and weaknesses:
| Сенсор | Плюсы | Минусы |
|---|---|---|
| LiDAR (Velodyne HDL-64E) | Точная 3D карта, работает ночью | Дождь/снег, стоимость |
| Камера | Распознавание знаков/разметки, дёшево | Нет 3D, чувствительна к освещению |
| Radar (77 GHz) | Скорость объектов, работает в метель | Низкое разрешение |
| Ultrasonic | Ближняя зона, парковка | Только <5 метров |
Fusion Architecture:
import numpy as np
import torch
from torch import nn
class SensorFusionNetwork(nn.Module):
"""
Ранняя/поздняя fusion: LiDAR point cloud + camera image → 3D detection
Реализация: PointPillars (быстрее PointNet для LiDAR) + ResNet для камеры
"""
def __init__(self, n_classes=10):
super().__init__()
# LiDAR branch: PointPillars → BEV feature map
self.pillar_vfe = PillarVFE(in_channels=9, out_channels=64)
self.lidar_backbone = nn.Sequential(
nn.Conv2d(64, 128, 3, stride=2, padding=1),
nn.BatchNorm2d(128), nn.ReLU(),
nn.Conv2d(128, 256, 3, stride=2, padding=1),
nn.BatchNorm2d(256), nn.ReLU(),
)
# Camera branch: ResNet50 backbone → projection to BEV
self.camera_backbone = torchvision.models.resnet50(pretrained=True)
self.cam_to_bev = CamToBEVProjection(intrinsics=None)
# Fusion
self.fusion_conv = nn.Conv2d(512, 256, 1)
# Detection head
self.det_head = AnchorFreeDetectionHead(n_classes=n_classes)
def forward(self, lidar_pillars, camera_imgs, lidar_indices):
# LiDAR
lidar_features = self.pillar_vfe(lidar_pillars)
bev_lidar = self.scatter_to_bev(lidar_features, lidar_indices)
lidar_out = self.lidar_backbone(bev_lidar)
# Camera → BEV
cam_features = self.camera_backbone.layer3(camera_imgs)
bev_cam = self.cam_to_bev(cam_features)
# Concat & fuse
fused = torch.cat([lidar_out, bev_cam], dim=1)
fused = self.fusion_conv(fused)
return self.det_head(fused)
Semantic Segmentation:
BEV (Bird's Eye View) segmentation: each voxel/pixel is a class (road, pedestrian, car, marking). For urban environments: SegFormer-B5; for highways: lighter architectures with 99+ fps.
Prediction (prediction of behavior)
Detecting an object is only half the task. More important is predicting where it will move in the next 3–5 seconds.
Social Force Model + LSTM:
Pedestrians interact socially: they avoid collisions and follow groups. Social LSTM captures these interactions by pooling the hidden states of neighboring agents.
Transformer-based trajectory prediction:
Wayformer, MotionTransformer: attention across all agents and map elements → multi-modal probabilistic trajectory prediction (6 hypotheses × their probabilities).
Planning & Control
Path Planning:
- Global: A* or Dijkstra on HD map (Here HD Live Map, TomTom AutoStream) - Local: Lattice Planner or MPC with obstacle avoidance in the horizon of 5–10 sec - Frenet Frame: planning in the coordinate system along/across the road
MPC Controller:
class VehicleMPC:
"""
Model Predictive Control для управления транспортным средством.
State: [x, y, yaw, v], Control: [steering, acceleration]
"""
def __init__(self, dt=0.1, horizon=20):
self.dt = dt
self.N = horizon
def bicycle_model(self, state, control, L=2.7):
"""Кинематическая велосипедная модель ТС"""
x, y, yaw, v = state
delta, a = control # угол поворота руля, ускорение
x_new = x + v * np.cos(yaw) * self.dt
y_new = y + v * np.sin(yaw) * self.dt
yaw_new = yaw + v / L * np.tan(delta) * self.dt
v_new = v + a * self.dt
return np.array([x_new, y_new, yaw_new, np.clip(v_new, 0, 30)])
Industrial applications
Mining dump trucks (Autonomous Haul System): - Komatsu FrontRunner, Caterpillar MineStar: in serial operation since 2013 - Primary sensor: GPS RTK + LiDAR + radar - Productivity: +15% vs. manned (no fatigue, optimal speeds)
Port AGVs (Automated Guided Vehicles): - Rotterdam, Hamburg terminals: 100% AGVs for container handling - Navigation: QR codes on the floor + LiDAR - Throughput: +20–25% stacking density in the same area
Warehouses (AMR — Autonomous Mobile Robots): - Geek+, 6 River Systems, Fetch Robotics - ROS 2 (Robot Operating System) as middleware - Dynamic route planning: SLAM (Simultaneous Localization and Mapping)
Safety & Certification
Functional Safety (ISO 26262 / SOTIF ISO 21448): - ASIL D: the highest level for critical control systems - Redundancy: ECU duplication, emergency brake - Formal verification: critical algorithms are verified using formal methods
Development time: 12–24 months for a full stack of autonomous vehicles; industrial AGVs/robots – 6–12 months.







