AI System for Last-Mile Delivery Robots
Last-mile delivery robots operate in fundamentally different environment than warehouse robots: unstructured urban sidewalks, irregularities, ramps, intersections, unpredictable pedestrians. This makes task significantly more complex and requires different approach to perception and decision-making.
Perception Stack
Delivery robot (Starship, Kiwibot, Yandex Rover type) uses multiple modalities:
Sensor package:
- 9-12 cameras for 360° view (fisheye, 1-2 Mpixel)
- 2-4 LiDAR (Livox Mid-360 or custom solid-state)
- Ultrasonic sensors for close range (< 1 m)
- RTK GPS for global localization + Visual SLAM for precise
Object detection:
- YOLOv8 / RT-DETR for pedestrians, bicycles, vehicles
- Semantic segmentation (SegFormer) for surface classification: asphalt, grass, curb, puddle
- Depth estimation from stereo or monocular (UniDepth, DPT)
All runs on edge computer: NVIDIA Jetson Orin NX or similar with TensorRT optimization for 30+ FPS per stream.
Navigation in Urban Environment
Global route: laid along HD map of sidewalks (OSM + custom marking). Graph of passable segments with attributes: sidewalk width, surface type, curb presence, nighttime lighting.
Local planner: RL works here. Agent trained in Isaac Sim with photorealistic urban scenes (NVIDIA Omniverse). Task: over 10-second horizon select trajectory avoiding pedestrian and obstacle collisions.
Algorithm: TD3 (Twin Delayed DDPG) for continuous velocity space. Input tensor: Bird's Eye View (BEV) 64×64 m around robot with semantic layers + state vector.
Working with Unstructured Obstacles
Urban sidewalks full of edge cases absent in WMS scenarios:
| Situation | Strategy |
|---|---|
| Curb without ramp | Route around / find ramp |
| Puddle / snow | Reduce speed, go around |
| Construction fence | Reroute global path |
| Crowd of pedestrians | Stop, wait for passage |
| Dog without leash | Soft stop, go around |
Out-of-Distribution (OOD) detector processes rare events: if perception module confidence below threshold, system enters safe-stop mode and requests operator.
Human-in-the-Loop Teleoperation
Full autonomy achievable only within ODD (Operational Design Domain) with clear constraints. Initially part of edge cases handled by teleoperat ors:
- Video stream from 4 cameras in real-time (WebRTC, < 200 ms latency)
- Operator takes control via gamepad
- All teleoperations logged as training data (DAgger — Dataset Aggregation)
- As data accumulates, manual intervention percentage decreases
Typical dynamics: first month — 15-25% missions need intervention, after 6 months — 1-3%.
Fleet Management and Monitoring
Centralized Fleet Controller:
- Order dispatch: nearest available robot considering charge and position
- Predictive charging: route energy calculation + 20% buffer
- Real-time: geolocation + status of each robot (Kafka + TimescaleDB)
Operational efficiency metrics:
- Mission Success Rate: > 95% target
- Average Delivery Time vs. ETA: deviation < 10%
- Intervention Rate: % missions with teleoperation
- MTBF (Mean Time Between Failures): > 200 h
Regulatory Aspects
Different jurisdictions have different requirements. USA: NHTSA oversight, some states (California, Texas, Virginia) require special permits for sidewalk robots. Europe: GDPR compliance (face depersonalization in video mandatory), national traffic codes.
Privacy-by-design technical implementation: face detection + real-time blurring before disk recording. Raw video storage only for incidents, otherwise — aggregated data only.
Timeline: MVP with basic sidewalk navigation — 4-5 months. Production system with fleet management and teleoperation — 9-12 months.







