LobsDICE: Offline Imitation Learning from Observation via Stationary Distribution Correction Estimation

02/28/2022
by   Geon-Hyeong Kim, et al.
0

We consider the problem of imitation from observation (IfO), in which the agent aims to mimic the expert's behavior from the state-only demonstrations by experts. We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agent with unknown quality. This offline setting for IfO is appealing in many real-world scenarios where the ground-truth expert actions are inaccessible and the arbitrary environment interactions are costly or risky. In this paper, we present LobsDICE, an offline IfO algorithm that learns to imitate the expert policy via optimization in the space of stationary distributions. Our algorithm solves a single convex minimization problem, which minimizes the divergence between the two state-transition distributions induced by the expert and the agent policy. On an extensive set of offline IfO tasks, LobsDICE shows promising results, outperforming strong baseline algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2023

Get Back Here: Robust Imitation by Return-to-Distribution Planning

We consider the Imitation Learning (IL) setup where expert data are not ...
research
06/06/2021

Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage

This paper studies offline Imitation Learning (IL) where an agent learns...
research
02/21/2020

GenDICE: Generalized Offline Estimation of Stationary Values

An important problem that arises in reinforcement learning and Monte Car...
research
10/27/2021

TRAIL: Near-Optimal Imitation Learning with Suboptimal Data

The aim in imitation learning is to learn effective policies by utilizin...
research
07/01/2020

Policy Improvement from Multiple Experts

Despite its promise, reinforcement learning's real-world adoption has be...
research
05/23/2022

Data augmentation for efficient learning from parametric experts

We present a simple, yet powerful data-augmentation technique to enable ...
research
02/27/2020

State-only Imitation with Transition Dynamics Mismatch

Imitation Learning (IL) is a popular paradigm for training agents to ach...

Please sign up or login with your details

Forgot password? Click here to reset