DeepAI AI Chat
Log In Sign Up

Inverse Reinforcement Learning with Conditional Choice Probabilities

09/22/2017
by   Mohit Sharma, et al.
0

We make an important connection to existing results in econometrics to describe an alternative formulation of inverse reinforcement learning (IRL). In particular, we describe an algorithm using Conditional Choice Probabilities (CCP), which are maximum likelihood estimates of the policy estimated from expert demonstrations, to solve the IRL problem. Using the language of structural econometrics, we re-frame the optimal decision problem and introduce an alternative representation of value functions due to (Hotz and Miller 1993). In addition to presenting the theoretical connections that bridge the IRL literature between Economics and Robotics, the use of CCPs also has the practical benefit of reducing the computational cost of solving the IRL problem. Specifically, under the CCP representation, we show how one can avoid repeated calls to the dynamic programming subroutine typically used in IRL. We show via extensive experimentation on standard IRL benchmarks that CCP-IRL is able to outperform MaxEnt-IRL, with as much as a 5x speedup and without compromising on the quality of the recovered reward function.

READ FULL TEXT
06/07/2021

Identifiability in inverse reinforcement learning

Inverse reinforcement learning attempts to reconstruct the reward functi...
05/28/2021

Task-Guided Inverse Reinforcement Learning Under Partial Information

We study the problem of inverse reinforcement learning (IRL), where the ...
11/15/2021

Versatile Inverse Reinforcement Learning via Cumulative Rewards

Inverse Reinforcement Learning infers a reward function from expert demo...
11/07/2019

Option Compatible Reward Inverse Reinforcement Learning

Reinforcement learning with complex tasks is a challenging problem. Ofte...
03/04/2021

Inverse Reinforcement Learning with Explicit Policy Estimates

Various methods for solving the inverse reinforcement learning (IRL) pro...
07/12/2019

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

Imitation learning, followed by reinforcement learning algorithms, is a ...
12/13/2017

Inverse Reinforcement Learning for Marketing

Learning customer preferences from an observed behaviour is an important...