State-only Imitation with Transition Dynamics Mismatch

02/27/2020
by   Tanmay Gangwani, et al.
10

Imitation Learning (IL) is a popular paradigm for training agents to achieve complicated goals by leveraging expert behavior, rather than dealing with the hardships of designing a correct reward function. With the environment modeled as a Markov Decision Process (MDP), most of the existing IL algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitator policy is to be learned. This is uncharacteristic of many real-life scenarios where discrepancies between the expert and the imitator MDPs are common, especially in the transition dynamics function. Furthermore, obtaining expert actions may be costly or infeasible, making the recent trend towards state-only IL (where expert demonstrations constitute only states or observations) ever so promising. Building on recent adversarial imitation approaches that are motivated by the idea of divergence minimization, we present a new state-only IL algorithm in this paper. It divides the overall optimization objective into two subproblems by introducing an indirection step and solves the subproblems iteratively. We show that our algorithm is particularly effective when there is a transition dynamics mismatch between the expert and imitator MDPs, while the baseline IL methods suffer from performance degradation. To analyze this, we construct several interesting MDPs by modifying the configuration parameters for the MuJoCo locomotion tasks from OpenAI Gym.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2022

Imitation Learning from Observations under Transition Model Disparity

Learning to perform tasks by leveraging a dataset of expert observations...
research
02/12/2022

Robust Learning from Observation with Model Misspecification

Imitation learning (IL) is a popular paradigm for training policies in r...
research
09/13/2020

Toward the Fundamental Limits of Imitation Learning

Imitation learning (IL) aims to mimic the behavior of an expert policy i...
research
05/20/2021

Cross-domain Imitation from Observations

Imitation learning seeks to circumvent the difficulty in designing prope...
research
02/28/2022

LobsDICE: Offline Imitation Learning from Observation via Stationary Distribution Correction Estimation

We consider the problem of imitation from observation (IfO), in which th...
research
08/17/2023

Regularizing Adversarial Imitation Learning Using Causal Invariance

Imitation learning methods are used to infer a policy in a Markov decisi...
research
06/19/2021

Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions

This paper is dedicated to designing provably efficient adversarial imit...

Please sign up or login with your details

Forgot password? Click here to reset