Policy Gradient Bayesian Robust Optimization for Imitation Learning

06/11/2021
by   Zaynah Javed, et al.
17

The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizing for expected performance, many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.

READ FULL TEXT

page 20

page 22

page 24

research
07/24/2020

Bayesian Robust Optimization for Imitation Learning

One of the main challenges in imitation learning is determining what act...
research
06/24/2020

When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence

Generative adversarial imitation learning (GAIL) is a popular inverse re...
research
07/31/2021

Risk Averse Bayesian Reward Learning for Autonomous Navigation from Human Demonstration

Traditional imitation learning provides a set of methods and algorithms ...
research
02/21/2020

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

Bayesian reward learning from demonstrations enables rigorous safety and...
research
01/03/2023

Genetic Imitation Learning by Reward Extrapolation

Imitation learning demonstrates remarkable performance in various domain...
research
06/15/2023

Residual Q-Learning: Offline and Online Policy Customization without Value

Imitation Learning (IL) is a widely used framework for learning imitativ...
research
02/19/2019

Learning to Generalize from Sparse and Underspecified Rewards

We consider the problem of learning from sparse and underspecified rewar...

Please sign up or login with your details

Forgot password? Click here to reset