Learning Reward Models for Cooperative Trajectory Planning with Inverse Reinforcement Learning and Monte Carlo Tree Search

02/14/2022
by   Karl Kurzer, et al.
0

Cooperative trajectory planning methods for automated vehicles, are capable to solve traffic scenarios that require a high degree of cooperation between traffic participants. In order for cooperative systems to integrate in human-centered traffic, it is important that the automated systems behave human-like, so that humans can anticipate the system's decisions. While Reinforcement Learning has made remarkable progress in solving the decision making part, it is non-trivial to parameterize a reward model that yields predictable actions. This work employs feature-based Maximum Entropy Inverse Reinforcement Learning in combination with Monte Carlo Tree Search to learn reward models that maximizes the likelihood of recorded multi-agent cooperative expert trajectories. The evaluation demonstrates that the approach is capable of recovering a reasonable reward model that mimics the expert and performs similar to a manually tuned baseline reward model.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset