AI Path Planning System for Autonomous Vehicles
Path planning in autonomous vehicles is a three-level task: global (route A→B), local (obstacle avoidance), reactive (emergency braking in 100 ms). RL solves the middle and reactive levels where classical algorithms (A*, RRT, MPC) require manual coding of hundreds of edge cases.
Autonomous Driving Stack
Perception: LiDAR (Velodyne, OUSTER, Livox) + Camera (RGB, stereo) + Radar + GPS/IMU. Fusion via Kalman Filter or Deep Fusion.
Localization: NDT matching, LOAM/LIO-SAM, HD Map matching. Accuracy: <10 cm in cities.
Planning:
- Global: A* / Dijkstra on HD Map (OpenStreetMap + Lanelet2)
- Local/reactive: RL, MPC, lattice planner
Control: Pure Pursuit, Stanley, MPC → steering/throttle/brake signals.
Frameworks: Autoware (ROS2-based, open source), Apollo (Baidu), CARLA (simulator).
RL for Local Planning
CARLA simulator: Python/C++ API, photorealistic rendering, vehicle physics, pedestrians, other participants. Ideal for training RL agents.
State space:
# Bird-eye view (BEV) representation
state = {
'bev_map': np.zeros((256, 256, 7)), # occupancy, lanes, route, vehicles...
'ego_state': np.array([vx, vy, heading, steer, throttle]),
'route_waypoints': waypoints[:20], # next 20 waypoints
'traffic_light': light_state # red/green/yellow
}
Action space (continuous):
action_space = spaces.Box(
low=np.array([-1.0, 0.0, 0.0]), # steer, throttle, brake
high=np.array([1.0, 1.0, 1.0]),
dtype=np.float32
)
# or discrete: {turn_left, go_straight, turn_right, stop}
Reward shaping for safe driving:
def compute_reward(self, action, info):
reward = 0
# route progress
route_completion = info['route_completion']
reward += route_completion * 5.0
# speed (not too slow, not too fast)
target_speed = 30 / 3.6 # 30 km/h in m/s
speed_diff = abs(info['speed'] - target_speed)
reward -= speed_diff * 0.1
# violations
if info['collision']:
reward -= 100.0
if info['lane_invasion']:
reward -= 10.0
if info['red_light_violation']:
reward -= 50.0
# comfort (jerk)
jerk = abs(action[1] - self.prev_throttle) + abs(action[0] - self.prev_steer)
reward -= jerk * 0.5
return reward
Neural Network Architecture
CNN + LSTM for BEV input:
class ADPlanningNet(nn.Module):
def __init__(self):
super().__init__()
# BEV encoder
self.bev_encoder = nn.Sequential(
nn.Conv2d(7, 32, 5, stride=2), nn.ReLU(),
nn.Conv2d(32, 64, 5, stride=2), nn.ReLU(),
nn.Conv2d(64, 128, 3, stride=2), nn.ReLU(),
nn.AdaptiveAvgPool2d(4),
nn.Flatten()
) # → 2048
# waypoints encoder
self.waypoint_encoder = nn.GRU(2, 64, batch_first=True)
# actor
self.actor = nn.Sequential(
nn.Linear(2048 + 64 + 5, 256), nn.ReLU(),
nn.Linear(256, 128), nn.ReLU(),
nn.Linear(128, 3), nn.Tanh()
)
Transformer for multi-agent scenarios: Attention over other participants (pedestrians, vehicles). Scales to variable number of objects.
Hierarchical Planning
Level 3: Mission Planner (A* on HD Map)
→ sequence of waypoints
Level 2: Behavioral (RL agent)
→ choice of tactic: overtake/follow/yield priority
Level 1: Motion Planner (MPC / lattice)
→ trajectory for 3–5 sec
Level 0: PID Controller
→ actuator commands @ 100 Hz
RL at Level 2 makes high-level decisions (overtake/don't overtake) at ~10 Hz. MPC at Level 1 ensures smooth trajectory following.
Safety Constraints
For autonomous vehicles, a safety layer above RL is mandatory:
RSS (Responsibility-Sensitive Safety): Intel formal safety model. Computes safe distance in real-time. Overrides RL action on RSS violation.
Control Barrier Functions (CBF): Mathematically guaranteed collision avoidance through RL action modification.
from cbf_safety import CBFSafetyLayer
safety_layer = CBFSafetyLayer(safety_margin=1.5) # 1.5m buffer
raw_action = rl_policy.predict(state)
safe_action = safety_layer.project(raw_action, obstacles)
# safe_action ≈ raw_action, but guaranteed safe
Testing
Scenario-based testing: CARLA scenarios library — corner cases: pedestrian running out suddenly, ghost vehicle, slippery road.
Adversarial testing: RL adversarial agent intentionally creates difficult situations to test the policy.
Metrics:
- Route completion rate (RCR) — % of successfully completed routes
- Infraction rate — infractions per 1 km
- Comfort: acceleration/jerk metrics
Timeline: 24–48 weeks
Basic RL agent in CARLA on straight urban routes — 12 weeks. Full system with hierarchy, safety layer, complex scenarios — 24–32 weeks. Real hardware (road tests) — beyond typical software development project scope.







