Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

01/08/2019
by   Mikael Henaff, et al.
6

Learning a policy using only observational data is challenging because the distribution of states it induces at execution time may differ from the distribution observed during training. We propose to train a policy by unrolling a learned model of the environment dynamics over multiple time steps while explicitly penalizing two costs: the original cost the policy seeks to optimize, and an uncertainty cost which represents its divergence from the states it is trained on. We measure this second cost by using the uncertainty of the dynamics model about its own predictions, using recent ideas from uncertainty estimation for deep networks. We evaluate our approach using a large-scale observational dataset of driving behavior recorded from traffic cameras, and show that we are able to learn effective driving policies from purely observational data, with no environment interaction.

READ FULL TEXT

page 6

page 8

page 10

page 15

research
09/15/2023

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Off-policy evaluation and learning are concerned with assessing a given ...
research
11/23/2021

Learning Interactive Driving Policies via Data-driven Simulation

Data-driven simulators promise high data-efficiency for driving policy l...
research
04/09/2022

Improve Generalization of Driving Policy at Signalized Intersections with Adversarial Learning

Intersections are quite challenging among various driving scenes wherein...
research
06/13/2022

Learning Uncertainty with Artificial Neural Networks for Improved Predictive Process Monitoring

The inability of artificial neural networks to assess the uncertainty of...
research
07/02/2020

Towards Data-Driven Affirmative Action Policies under Uncertainty

In this paper, we study university admissions under a centralized system...
research
01/20/2023

Offline Policy Evaluation with Out-of-Sample Guarantees

We consider the problem of evaluating the performance of a decision poli...
research
06/10/2020

Distributional Robust Batch Contextual Bandits

Policy learning using historical observational data is an important prob...

Please sign up or login with your details

Forgot password? Click here to reset