CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning

02/09/2023
by   Sheng Yue, et al.
0

This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL), namely the reward extrapolation error, where the learned reward function may fail to explain the task correctly and misguide the agent in unseen environments due to the intrinsic covariate shift. Leveraging both expert data and lower-quality diverse data, we devise a principled algorithm (namely CLARE) that solves offline IRL efficiently via integrating "conservatism" into a learned reward function and utilizing an estimated dynamics model. Our theoretical analysis provides an upper bound on the return gap between the learned policy and the expert policy, based on which we characterize the impact of covariate shift by examining subtle two-tier tradeoffs between the exploitation (on both expert and diverse data) and exploration (on the estimated dynamics model). We show that CLARE can provably alleviate the reward extrapolation error by striking the right exploitation-exploration balance therein. Extensive experiments corroborate the significant performance gains of CLARE over existing state-of-the-art algorithms on MuJoCo continuous control tasks (especially with a small offline dataset), and the learned reward is highly instructive for further learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2023

A Bayesian Approach to Robust Inverse Reinforcement Learning

We consider a Bayesian approach to offline model-based inverse reinforce...
research
05/27/2020

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning po...
research
02/15/2023

Understanding Expertise through Demonstrations: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning

Offline inverse reinforcement learning (Offline IRL) aims to recover the...
research
03/01/2023

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Recent methods for imitation learning directly learn a Q-function using ...
research
04/22/2020

Policy Gradient from Demonstration and Curiosity

With reinforcement learning, an agent could learn complex behaviors from...
research
10/13/2022

Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief

Model-based offline reinforcement learning (RL) aims to find highly rewa...
research
08/11/2023

Learning Control Policies for Variable Objectives from Offline Data

Offline reinforcement learning provides a viable approach to obtain adva...

Please sign up or login with your details

Forgot password? Click here to reset