Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

07/03/2017
by   Daniel S. Brown, et al.
0

In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting---where the true reward function is unknown and only samples of expert behavior are given. We propose a sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the α-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function. We evaluate our proposed bound on both a standard grid navigation task and a simulated driving task and achieve tighter and more accurate bounds than a feature count-based baseline. We also give examples of how our proposed bound can be utilized to perform risk-aware policy selection and risk-aware policy improvement. Because our proposed bound requires several orders of magnitude fewer demonstrations than existing high-confidence bounds, it is the first practical method that allows agents that learn from demonstration to express confidence in the quality of their learned policy.

READ FULL TEXT
research
11/28/2022

Autonomous Assessment of Demonstration Sufficiency via Bayesian Inverse Reinforcement Learning

In this paper we examine the problem of determining demonstration suffic...
research
07/18/2022

Active Exploration for Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferrin...
research
01/08/2019

Risk-Aware Active Inverse Reinforcement Learning

Active learning from demonstration allows a robot to query a human for s...
research
08/04/2020

Deep Inverse Q-learning with Constraints

Popular Maximum Entropy Inverse Reinforcement Learning approaches requir...
research
05/30/2019

Defining Admissible Rewards for High Confidence Policy Evaluation

A key impediment to reinforcement learning (RL) in real applications wit...
research
06/20/2016

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

For an autonomous agent, executing a poor policy may be costly or even d...
research
03/06/2020

Cost-Sensitive Portfolio Selection via Deep Reinforcement Learning

Portfolio Selection is an important real-world financial task and has at...

Please sign up or login with your details

Forgot password? Click here to reset