Inverse Reinforcement Learning without Reinforcement Learning

03/26/2023
by   Gokul Swamy, et al.
0

Inverse Reinforcement Learning (IRL) is a powerful set of techniques for imitation learning that aims to learn a reward function that rationalizes expert demonstrations. Unfortunately, traditional IRL methods suffer from a computational weakness: they require repeatedly solving a hard reinforcement learning (RL) problem as a subroutine. This is counter-intuitive from the viewpoint of reductions: we have reduced the easier problem of imitation learning to repeatedly solving the harder problem of RL. Another thread of work has proved that access to the side-information of the distribution of states where a strong policy spends time can dramatically reduce the sample and computational complexities of solving an RL problem. In this work, we demonstrate for the first time a more informed imitation learning reduction where we utilize the state distribution of the expert to alleviate the global exploration component of the RL subroutine, providing an exponential speedup in theory. In practice, we find that we are able to significantly speed up the prior art on continuous control tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2021

Imitation Learning by Reinforcement Learning

Imitation Learning algorithms learn a policy from demonstrations of expe...
research
03/06/2017

Third-Person Imitation Learning

Reinforcement learning (RL) makes it possible to train agents capable of...
research
03/01/2018

Hierarchical Imitation and Reinforcement Learning

We study the problem of learning policies over long time horizons. We pr...
research
07/18/2022

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation

The current paper studies sample-efficient Reinforcement Learning (RL) i...
research
02/03/2022

Challenging Common Assumptions in Convex Reinforcement Learning

The classic Reinforcement Learning (RL) formulation concerns the maximiz...
research
07/05/2022

Planning with RL and episodic-memory behavioral priors

The practical application of learning agents requires sample efficient a...
research
09/25/2022

Unsupervised Reward Shaping for a Robotic Sequential Picking Task from Visual Observations in a Logistics Scenario

We focus on an unloading problem, typical of the logistics sector, model...

Please sign up or login with your details

Forgot password? Click here to reset