Inverse Reinforcement Learning with Explicit Policy Estimates

03/04/2021
by   Navyata Sanghvi, et al.
9

Various methods for solving the inverse reinforcement learning (IRL) problem have been developed independently in machine learning and economics. In particular, the method of Maximum Causal Entropy IRL is based on the perspective of entropy maximization, while related advances in the field of economics instead assume the existence of unobserved action shocks to explain expert behavior (Nested Fixed Point Algorithm, Conditional Choice Probability method, Nested Pseudo-Likelihood Algorithm). In this work, we make previously unknown connections between these related methods from both fields. We achieve this by showing that they all belong to a class of optimization problems, characterized by a common form of the objective, the associated policy and the objective gradient. We demonstrate key computational and algorithmic differences which arise between the methods due to an approximation of the optimal soft value function, and describe how this leads to more efficient algorithms. Using insights which emerge from our study of this class of optimization problems, we identify various problem scenarios and investigate each method's suitability for these problems.

READ FULL TEXT

page 7

page 14

research
05/10/2023

A proof of convergence of inverse reinforcement learning for multi-objective optimization

We show the convergence of Wasserstein inverse reinforcement learning fo...
research
08/26/2020

Inverse Policy Evaluation for Value-based Sequential Decision-making

Value-based methods for reinforcement learning lack generally applicable...
research
06/10/2018

Implicit Policy for Reinforcement Learning

We introduce Implicit Policy, a general class of expressive policies tha...
research
03/22/2022

A Primer on Maximum Causal Entropy Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) algorithms infer a reward function ...
research
10/04/2022

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Inverse reinforcement learning (IRL) aims to recover the reward function...
research
10/30/2017

Artificial Intelligence as Structural Estimation: Economic Interpretations of Deep Blue, Bonanza, and AlphaGo

Artificial intelligence (AI) has achieved superhuman performance in a gr...
research
09/22/2017

Inverse Reinforcement Learning with Conditional Choice Probabilities

We make an important connection to existing results in econometrics to d...

Please sign up or login with your details

Forgot password? Click here to reset