How does the AI system ensure pedestrian safety?

The system uses multimodal detection (cameras and LiDAR) to detect pedestrians up to 50 meters away. If a collision is highly likely, the planner applies emergency braking. In complex scenarios, a teleoperator takes over.

What happens if the robot encounters an unusual obstacle?

The OOD detector assesses the perceptual module's confidence. If it falls below a threshold, the robot enters safe-stop and requests remote control. All such incidents are logged for model retraining.

How is the robot trained on new routes?

The route is first traversed manually by an operator for data collection (DAgger). Then the autonomous agent is trained in the Isaac Sim simulator. On the real robot, manual intervention drops from 25% to 1–3% within six months.

How long does AI system implementation take?

An MVP with basic sidewalk navigation takes 4–5 months. A full system with teleoperation and fleet management takes 9–12 months. Timelines depend on environment complexity and ODD requirements.

What guarantees do you provide for system performance?

We guarantee Mission Success Rate above 95% after the stabilization phase, along with a fixed SLA for support. If metrics are not met, we perform free tuning iterations.

How does the AI system ensure pedestrian safety?

The system uses multimodal detection (cameras and LiDAR) to detect pedestrians up to 50 meters away. If a collision is highly likely, the planner applies emergency braking. In complex scenarios, a teleoperator takes over.

What happens if the robot encounters an unusual obstacle?

The OOD detector assesses the perceptual module's confidence. If it falls below a threshold, the robot enters safe-stop and requests remote control. All such incidents are logged for model retraining.

How is the robot trained on new routes?

The route is first traversed manually by an operator for data collection (DAgger). Then the autonomous agent is trained in the Isaac Sim simulator. On the real robot, manual intervention drops from 25% to 1–3% within six months.

How long does AI system implementation take?

An MVP with basic sidewalk navigation takes 4–5 months. A full system with teleoperation and fleet management takes 9–12 months. Timelines depend on environment complexity and ODD requirements.

What guarantees do you provide for system performance?

We guarantee Mission Success Rate above 95% after the stabilization phase, along with a fixed SLA for support. If metrics are not met, we perform free tuning iterations.

AI for Delivery Robots: Navigation, Teleoperation, Fleet Monitoring

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1564 services

AI for Delivery Robots: Navigation, Teleoperation, Fleet Monitoring

Complex

from 1 week to 3 months

Frequently Asked Questions

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1347
Development of a web application for FEEDME
1247
Website development for BELFINGROUP
948
Development of an online store for the company FURNORO
1183
B2B Advance company logo design
642
Development of a web application for Enviok
921

Show more works

AI for Delivery Robots: Navigation, Teleoperation, Fleet Monitoring

Imagine a delivery robot stuck at a curb, failing to see a puddle, or losing GPS under a bridge. In unstructured urban environments, standard algorithms fall short. We solve this with a stack combining computer vision, reinforcement learning, and teleoperation. Our experience: 5+ years in AI for robotics, over 30 deployed projects. Key challenges include terrain irregularities, dynamic obstacles, and lighting variations. We built a system that processes 30 frames per second from multiple data sources, achieving localization accuracy of 10 cm. Unlike traditional approaches, our RL planner reduces manual interventions by 90% compared to rule-based systems, and AI deployment cuts delivery costs by up to 40%.

Project success hinges on precise ODD definition and data quality. We help collect and label datasets for model training and conduct simulation testing before real-world deployment. We'll assess your project and propose a turnkey solution.

What Technical Problems We Solve

City sidewalks are full of edge cases: uneven surfaces, ramps, intersections, unpredictable pedestrians. A delivery robot must:

Detect objects with latency under 33 ms (30 FPS).
Distinguish surface types (asphalt, grass, puddle, snow).
Plan trajectories in real time considering dynamic obstacles.
Handle temporary obstacles (construction fences, crowds).

Standard SLAM algorithms fail in textureless environments. We use Visual SLAM with IMU fusion and SLAM for 10 cm accuracy.

Why AI Is Critical for Delivery Robots

Without AI, a robot cannot adapt to environmental changes. We combine detection, segmentation, and depth estimation. The sensor package includes:

9–12 cameras for 360° coverage.
2–4 LiDARs (Livox Mid-360 or solid-state).
Ultrasonic sensors for close range.
RTK GPS + Visual SLAM.

Detection is built on YOLOv8 and RT-DETR, segmentation on SegFormer. All optimized for NVIDIA Jetson Orin NX via TensorRT, achieving 30+ FPS per stream.

How We Achieve 95%+ Success Rate in Urban Environments

Global path planning uses an HD map of sidewalks (OSM + custom annotations). The segment graph includes attributes: width, surface type, curb presence, lighting.

The local planner uses RL (TD3) with continuous velocity space. Input is a 64×64 m BEV with semantic layers. The agent is trained in Isaac Sim (NVIDIA Omniverse) with photorealistic urban scenes over a 10-second horizon.

For unusual situations, we deploy an OOD detector with safe-stop and operator request.

Situation	Strategy
Curb without ramp	Bypass via HD map / search for ramp
Puddle / snow	Reduce speed, bypass
Construction fence	Replan global route
Crowd of pedestrians	Stop, wait for passage
Off-leash dog	Gentle stop, bypass

Details of sensor package and models

- Cameras: fisheye, 1–2 MP, 30 FPS. - LiDAR: Livox Mid-360, 360° x 90° FOV. - IMU: inertial module with 6 DOF. - Compute: NVIDIA Jetson Orin NX, 100 TOPS. - Models: YOLOv8l, SegFormer-B2, UniDepth.

Comparison: LiDAR vs Cameras

Parameter	LiDAR	Cameras	Combination
Range	up to 200 m	up to 100 m (stereo)	200+ m
Accuracy in darkness	100%	low	high
Semantic information	no	yes	yes
Cost	high	low	medium

YOLOv8 detection achieves mAP 0.55 on Cityscapes, segmentation IoU 0.78.

Human-in-the-Loop Teleoperation

Full autonomy is achievable only within a well-defined ODD. Initially, some edge cases are handled by teleoperators:

Video stream from 4 cameras (WebRTC, <200 ms latency).
Control via gamepad.
All sessions logged for DAgger training.
Intervention rate: first month 15–25%, after 6 months 1–3%.

Fleet Management and Monitoring

Centralized Fleet Controller:

Order dispatching to nearest free robot with charge consideration.
Predictive charging with 20% buffer.
Kafka + TimescaleDB for real-time monitoring.

Key metrics: Mission Success Rate >95%, Average Delivery Time deviation <10%, Intervention Rate, MTBF >200 hours.

Implementation Process

Analysis: environment audit, ODD definition, data collection (3–4 weeks).
Design: sensor architecture, model selection, pipeline (2–3 weeks).
Development: calibration, model training, integration (8–12 weeks).
Testing: simulation, real-world tests, iterations (4–6 weeks).
Deployment: fleet installation, fleet management setup (2–3 weeks).

Timeline: MVP with basic sidewalk navigation — 4–5 months. Full system with teleoperation and fleet management — 9–12 months. Cost is calculated individually. Request an analysis of your tasks — we'll find the optimal configuration.

What's Included

Documentation: architecture description, operation manuals.
Customer team training (2–3 days).
Access to monitoring dashboard.
Operational support (SLA).

We guarantee Mission Success Rate >95% after stabilization. Get a consultation on implementing an AI system for your project. Contact us to discuss details.

Reinforcement Learning: PPO, SAC, DQN and Industrial Applications

We see projects every day that fail not because of a weak algorithm, but because of incorrect rewards. An engineer writes reward = +1 for correct action, starts training, and after 10 million steps the agent finds a way to maximize reward without solving the task. This is reward hacking — a systemic pain of industrial RL. Our experience shows: proper reward accounts for 70% of success.

Why is RL harder than supervised learning?

In supervised learning, there is a dataset with correct answers. In RL, there is no correct answer — there is a scalar "better/worse" signal that arrives with a delay of hundreds of steps. The agent explores the space and finds a strategy on its own.

Consequences: training instability, high sensitivity to hyperparameters, slow convergence. PPO (Proximal Policy Optimization) on Atari converges in 10 million steps — that’s hours. On robotic tasks with real physics — days or weeks in simulation.

Algorithm selection by task:

Task	Algorithm	Reason
Continuous control (robotics, industrial processes)	SAC, TD3	Sample efficiency, stability
Discrete actions, game-playing	PPO, DQN + Rainbow	Simplicity, industry-proven
Multi-agent	MAPPO, QMIX	Cooperation/competition
Offline RL (dataset without environment)	CQL, IQL, TD3+BC	Learning without environment
RLHF (LLM alignment)	PPO, GRPO	Integration with reward model

How to tune PPO and avoid common problems?

PPO is the workhorse of RL. The main idea: limit policy updates via ratio clipping clip_range=0.2. This provides stability compared to vanilla policy gradient. But without proper tuning, the agent does not converge.

One common pitfall is entropy collapse: the agent becomes deterministic too quickly, stops exploring. Symptom — entropy coefficient drops to zero. Cure — ent_coef=0.01–0.05 and do not lower below 0.001. Another problem is value function divergence when vf_loss_coef is high and explained_variance is negative. We recommend vf_coef=0.5 and gradient clipping max_grad_norm=0.5.

Incorrect n_steps also breaks training. n_steps=2048 is Stable-Baselines3 default. For long-horizon tasks (>500 steps) it needs to be increased; for fast tasks (10–50 steps) decrease to 256–512.

For quick start, use stable-baselines3 + sb3-contrib. For research and custom algorithms — tianshou or CleanRL.

SAC for continuous control

SAC (Soft Actor-Critic) adds entropy maximization to the objective — the agent learns to be both efficient and diverse. This gives excellent sample efficiency and robustness to reward noise.

On industrial process control tasks, SAC usually outperforms PPO in convergence: fewer interactions are needed for the same quality. The key parameter is target_entropy. The standard value -dim(action_space) often works, but for specific tasks manual tuning is better.

How to transfer a trained agent to a real device?

Training RL on a real robot is expensive and dangerous. Standard approach: train in simulation → transfer to real hardware. The main problem is the reality gap: simulation does not replicate physics, friction, sensor noise.

The primary tool is domain randomization. During training, randomly vary environment parameters: object mass ±30%, friction coefficient ±50%, action delay 0–100 ms, observation noise σ=0.01–0.1. The agent learns to be robust to variations, and the real world becomes just another variation.

Comparison of popular simulators:

Simulator	Features	Performance
MuJoCo	Standard for robotics, medium physics	Single robot — CPU
Isaac Gym / Isaac Lab (NVIDIA)	GPU-accelerated, 10,000+ parallel environments	High (up to 50,000 fps on A100)
PyBullet	Free, convenient for prototyping	Low, CPU
Gazebo	ROS integration, full cycle	Medium, CPU+GPU

Case: manipulator for PCB component sorting

We used Isaac Gym with 4096 parallel environments on an A100, PPO with domain randomization (random mass, lighting, camera position). 500 million steps — 18 hours. After transfer to a real UR5, success rate was 78% without additional fine-tuning. After 2 hours on the real robot (10k steps) — 94%. Entire process — 3 weeks.

RLHF: training LLMs from human feedback

RLHF became the standard after InstructGPT. Classic scheme: supervised fine-tuning → reward model → PPO.

Problems with classic PPO: instability (KL-divergence can explode), slow convergence, tuning complexity. Hence popular alternatives:

DPO — bypasses reward model, learns from preference pairs. Simpler, more stable, but less flexible.
GRPO — used in DeepSeek-R1, good for reasoning tasks.
ORPO — combines SFT and alignment into one stage.

The trl library from Hugging Face is the standard. Supports PPO, DPO, ORPO, GRPO out of the box, works with PEFT/LoRA for memory-efficient fine-tuning.

"Reward hacking — one of the main reasons for failures in RL, along with incorrectly chosen environment architecture."

What is included in the work

Architectural solution and justification of algorithm selection
Development and documentation of the reward function
Creating a simulator or configuring an existing one
Training, hyperparameter sweep (Optuna / Ray Tune)
Transfer to real hardware or integration into product
Documentation, access to code and simulators
Team training and 3-month support after deployment

Work process

Task audit — define goals, resources, constraints.
Reward engineering — formalize desired behavior, check for reward hacking.
Environment and algorithm selection — baseline, first runs.
Systematic hyperparameter sweep — use Optuna.
Training in simulation with domain randomization.
Testing on real equipment (if necessary).
Deployment, monitoring, support.

Timeline: proof of concept — 2–4 weeks; production system with sim-to-real — 3–8 months; RLHF for LLM — 4–10 weeks. Pricing is calculated individually — we will assess your project in 2 days. Contact us for a consultation.

Our team has 5+ years of experience in RL, 30+ successful projects in robotics, supply chain optimization, and LLM alignment. We guarantee transparent architecture and full technical documentation. Order an RL system development — we will help you avoid common pitfalls and get a working system in a short time.