Offline Inverse Reinforcement Learning

06/09/2021
by   Firas Jarboui, et al.
0

The objective of offline RL is to learn optimal policies when a fixed exploratory demonstrations data-set is available and sampling additional observations is impossible (typically if this operation is either costly or rises ethical questions). In order to solve this problem, off the shelf approaches require a properly defined cost function (or its evaluation on the provided data-set), which are seldom available in practice. To circumvent this issue, a reasonable alternative is to query an expert for few optimal demonstrations in addition to the exploratory data-set. The objective is then to learn an optimal policy w.r.t. the expert's latent cost function. Current solutions either solve a behaviour cloning problem (which does not leverage the exploratory data) or a reinforced imitation learning problem (using a fixed cost function that discriminates available exploratory trajectories from expert ones). Inspired by the success of IRL techniques in achieving state of the art imitation performances in online settings, we exploit GAN based data augmentation procedures to construct the first offline IRL algorithm. The obtained policies outperformed the aforementioned solutions on multiple OpenAI gym environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2023

CEIL: Generalized Contextual Imitation Learning

In this paper, we present ContExtual Imitation Learning (CEIL), a genera...
research
05/25/2021

A Generalised Inverse Reinforcement Learning Framework

The gloabal objective of inverse Reinforcement Learning (IRL) is to esti...
research
07/20/2022

Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations

We study the problem of offline Imitation Learning (IL) where an agent a...
research
02/04/2022

SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

We propose State Matching Offline DIstribution Correction Estimation (SM...
research
07/20/2017

RAIL: Risk-Averse Imitation Learning

Imitation learning algorithms learn viable policies by imitating an expe...
research
07/31/2022

Robot Policy Learning from Demonstration Using Advantage Weighting and Early Termination

Learning robotic tasks in the real world is still highly challenging and...
research
06/09/2022

Receding Horizon Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) seeks to infer a cost function that...

Please sign up or login with your details

Forgot password? Click here to reset