DeepAI AI Chat
Log In Sign Up

Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

by   Daniel S. Brown, et al.
The University of Texas at Austin

Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization. However, despite much recent interest in IRL, little work has been done to understand of the minimum set of demonstrations needed to teach a specific sequential decision-making task. We formalize the problem of finding optimal demonstrations for IRL as a machine teaching problem where the goal is to find the minimum number of demonstrations needed to specify the reward equivalence class of the demonstrator. We extend previous work on algorithmic teaching for sequential decision-making tasks by showing an equivalence to the set cover problem, and use this equivalence to develop an efficient algorithm for determining the set of maximally-informative demonstrations. We apply our proposed machine teaching algorithm to two novel applications: benchmarking active learning IRL algorithms and developing an IRL algorithm that, rather than assuming demonstrations are i.i.d., uses counterfactual reasoning over informative demonstrations to learn more efficiently.


page 1

page 2

page 3

page 4


Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback

We study the problem of teaching via demonstrations in sequential decisi...

Boosting Reinforcement Learning and Planning with Demonstrations: A Survey

Although reinforcement learning has seen tremendous success recently, th...

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Learning near-optimal behaviour from an expert's demonstrations typicall...

Reasoning about Counterfactuals to Improve Human Inverse Reinforcement Learning

To collaborate well with robots, we must be able to understand their dec...

ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations

Learning from demonstrations is a popular tool for accelerating and redu...

PLUNDER: Probabilistic Program Synthesis for Learning from Unlabeled and Noisy Demonstrations

Learning from demonstration (LfD) is a widely researched paradigm for te...

"Guess what I'm doing": Extending legibility to sequential decision tasks

In this paper we investigate the notion of legibility in sequential deci...