Bayesian Robust Optimization for Imitation Learning

07/24/2020
by   Daniel S. Brown, et al.
23

One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function. While completely ignoring risk can lead to overly aggressive and unsafe policies, optimizing in a fully adversarial sense is also problematic as it can lead to overly conservative policies that perform poorly in practice. To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL). BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk. Our empirical results show that BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms.

READ FULL TEXT
research
06/11/2021

Policy Gradient Bayesian Robust Optimization for Imitation Learning

The difficulty in specifying rewards for many real-world problems has le...
research
07/31/2021

Risk Averse Bayesian Reward Learning for Autonomous Navigation from Human Demonstration

Traditional imitation learning provides a set of methods and algorithms ...
research
02/12/2021

Scalable Bayesian Inverse Reinforcement Learning

Bayesian inference over the reward presents an ideal solution to the ill...
research
09/19/2017

Incorrigibility in the CIRL Framework

A value learning system has incentives to follow shutdown instructions, ...
research
02/07/2022

A Ranking Game for Imitation Learning

We propose a new framework for imitation learning - treating imitation a...
research
01/18/2023

DIRECT: Learning from Sparse and Shifting Rewards using Discriminative Reward Co-Training

We propose discriminative reward co-training (DIRECT) as an extension to...
research
02/03/2022

Challenging Common Assumptions in Convex Reinforcement Learning

The classic Reinforcement Learning (RL) formulation concerns the maximiz...

Please sign up or login with your details

Forgot password? Click here to reset