OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

09/09/2021
by   Hana Hoshino, et al.
29

Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. This limits IRL applications in the real world, where environment interactions can become highly expensive. To tackle this problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which (1) adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment, (2) learns a stationary reward function that is transferable with high generalization capabilities on changing dynamics, and (3) leverages mode-covering behavior for faster convergence. We demonstrate that our method is considerably more sample efficient and generalizes to novel environments through the experiments. Our method achieves better or comparable results on policy performance baselines with significantly fewer interactions. Furthermore, we empirically show that the recovered reward function generalizes to different tasks where prior arts are prone to fail.

READ FULL TEXT

page 1

page 5

page 10

page 11

page 12

research
09/09/2018

Addressing Sample Inefficiency and Reward Bias in Inverse Reinforcement Learning

The Generative Adversarial Imitation Learning (GAIL) framework from Ho &...
research
02/01/2023

Internally Rewarded Reinforcement Learning

We study a class of reinforcement learning problems where the reward sig...
research
06/24/2020

Quantifying Differences in Reward Functions

For many tasks, the reward function is too complex to be specified proce...
research
07/18/2022

Active Exploration for Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferrin...
research
09/25/2022

Reward Learning using Structural Motifs in Inverse Reinforcement Learning

The Inverse Reinforcement Learning (IRL) problem has seen rapid evolutio...
research
09/15/2023

A Bayesian Approach to Robust Inverse Reinforcement Learning

We consider a Bayesian approach to offline model-based inverse reinforce...
research
09/27/2017

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

Reward engineering is an important aspect of reinforcement learning. Whe...

Please sign up or login with your details

Forgot password? Click here to reset