Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

05/20/2018
by   Daniel S. Brown, et al.
0

Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization. However, despite much recent interest in IRL, little work has been done to understand of the minimum set of demonstrations needed to teach a specific sequential decision-making task. We formalize the problem of finding optimal demonstrations for IRL as a machine teaching problem where the goal is to find the minimum number of demonstrations needed to specify the reward equivalence class of the demonstrator. We extend previous work on algorithmic teaching for sequential decision-making tasks by showing an equivalence to the set cover problem, and use this equivalence to develop an efficient algorithm for determining the set of maximally-informative demonstrations. We apply our proposed machine teaching algorithm to two novel applications: benchmarking active learning IRL algorithms and developing an IRL algorithm that, rather than assuming demonstrations are i.i.d., uses counterfactual reasoning over informative demonstrations to learn more efficiently.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2023

Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback

We study the problem of teaching via demonstrations in sequential decisi...
research
03/23/2023

Boosting Reinforcement Learning and Planning with Demonstrations: A Survey

Although reinforcement learning has seen tremendous success recently, th...
research
10/21/2018

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Learning near-optimal behaviour from an expert's demonstrations typicall...
research
03/03/2022

Reasoning about Counterfactuals to Improve Human Inverse Reinforcement Learning

To collaborate well with robots, we must be able to understand their dec...
research
10/26/2019

ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations

Learning from demonstrations is a popular tool for accelerating and redu...
research
03/02/2023

PLUNDER: Probabilistic Program Synthesis for Learning from Unlabeled and Noisy Demonstrations

Learning from demonstration (LfD) is a widely researched paradigm for te...
research
09/19/2022

"Guess what I'm doing": Extending legibility to sequential decision tasks

In this paper we investigate the notion of legibility in sequential deci...

Please sign up or login with your details

Forgot password? Click here to reset