SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

02/04/2022
by   Yecheng Jason Ma, et al.
0

We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile algorithm for offline imitation learning (IL) via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morphologically mismatched expert, and (iii) example-based reinforcement learning, which we show can be formulated as a state-occupancy matching problem. We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art.

READ FULL TEXT

page 2

page 6

page 7

page 8

page 20

page 21

research
06/26/2023

CEIL: Generalized Contextual Imitation Learning

In this paper, we present ContExtual Imitation Learning (CEIL), a genera...
research
06/25/2020

Strictly Batch Imitation Learning by Energy-based Distribution Matching

Consider learning a policy purely on the basis of demonstrated behavior—...
research
06/06/2021

SoftDICE for Imitation Learning: Rethinking Off-policy Distribution Matching

We present SoftDICE, which achieves state-of-the-art performance for imi...
research
02/16/2023

Imitation from Arbitrary Experience: A Dual Unification of Reinforcement and Imitation Learning Methods

It is well known that Reinforcement Learning (RL) can be formulated as a...
research
10/17/2022

Inferring Versatile Behavior from Demonstrations by Matching Geometric Descriptors

Humans intuitively solve tasks in versatile ways, varying their behavior...
research
06/09/2021

Offline Inverse Reinforcement Learning

The objective of offline RL is to learn optimal policies when a fixed ex...
research
02/02/2022

Causal Imitation Learning under Temporally Correlated Noise

We develop algorithms for imitation learning from policy data that was c...

Please sign up or login with your details

Forgot password? Click here to reset