Imitation Learning from Observations under Transition Model Disparity

04/25/2022
by   Tanmay Gangwani, et al.
1

Learning to perform tasks by leveraging a dataset of expert observations, also known as imitation learning from observations (ILO), is an important paradigm for learning skills without access to the expert reward function or the expert actions. We consider ILO in the setting where the expert and the learner agents operate in different environments, with the source of the discrepancy being the transition dynamics model. Recent methods for scalable ILO utilize adversarial learning to match the state-transition distributions of the expert and the learner, an approach that becomes challenging when the dynamics are dissimilar. In this work, we propose an algorithm that trains an intermediary policy in the learner environment and uses it as a surrogate expert for the learner. The intermediary policy is learned such that the state transitions generated by it are close to the state transitions in the expert dataset. To derive a practical and scalable algorithm, we employ concepts from prior work on estimating the support of a probability distribution. Experiments using MuJoCo locomotion tasks highlight that our method compares favorably to the baselines for ILO with transition dynamics mismatch.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2022

Robust Learning from Observation with Model Misspecification

Imitation learning (IL) is a popular paradigm for training policies in r...
research
02/27/2020

State-only Imitation with Transition Dynamics Mismatch

Imitation Learning (IL) is a popular paradigm for training agents to ach...
research
07/02/2020

Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch

We study the inverse reinforcement learning (IRL) problem under the tran...
research
05/19/2022

IL-flOw: Imitation Learning from Observation using Normalizing Flows

We present an algorithm for Inverse Reinforcement Learning (IRL) from ex...
research
02/02/2022

Causal Imitation Learning under Temporally Correlated Noise

We develop algorithms for imitation learning from policy data that was c...
research
06/19/2021

Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions

This paper is dedicated to designing provably efficient adversarial imit...
research
08/03/2022

Sequence Model Imitation Learning with Unobserved Contexts

We consider imitation learning problems where the expert has access to a...

Please sign up or login with your details

Forgot password? Click here to reset