Imitation Learning by State-Only Distribution Matching

02/09/2022
by   Damian Boborzi, et al.
0

Imitation Learning from observation describes policy learning in a similar way to human learning. An agent's policy is trained by observing an expert performing a task. While many state-only imitation learning approaches are based on adversarial imitation learning, one main drawback is that adversarial training is often unstable and lacks a reliable convergence estimator. If the true environment reward is unknown and cannot be used to select the best-performing model, this can result in bad real-world policy performance. We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric. Our training objective minimizes the Kulback-Leibler divergence (KLD) between the policy and expert state transition trajectories which can be optimized in a non-adversarial fashion. Such methods demonstrate improved robustness when learned density models guide the optimization. We further improve the sample efficiency by rewriting the KLD minimization as the Soft Actor Critic objective based on a modified reward using additional density models that estimate the environment's forward and backward dynamics. Finally, we evaluate the effectiveness of our approach on well-known continuous control environments and show state-of-the-art performance while having a reliable performance estimator compared to several recent learning-from-observation methods.

READ FULL TEXT

page 6

page 13

page 14

page 15

page 16

page 17

research
05/19/2022

IL-flOw: Imitation Learning from Observation using Normalizing Flows

We present an algorithm for Inverse Reinforcement Learning (IRL) from ex...
research
08/04/2021

A Pragmatic Look at Deep Imitation Learning

The introduction of the generative adversarial imitation learning (GAIL)...
research
09/06/2018

Sample-Efficient Imitation Learning via Generative Adversarial Nets

Recent work in imitation learning articulate their formulation around th...
research
06/19/2022

Robust Imitation Learning against Variations in Environment Dynamics

In this paper, we propose a robust imitation learning (IL) framework tha...
research
09/17/2020

Evolutionary Selective Imitation: Interpretable Agents by Imitation Learning Without a Demonstrator

We propose a new method for training an agent via an evolutionary strate...
research
06/18/2020

Reparameterized Variational Divergence Minimization for Stable Imitation

While recent state-of-the-art results for adversarial imitation-learning...
research
06/23/2021

IQ-Learn: Inverse soft-Q Learning for Imitation

In many sequential decision-making problems (e.g., robotics control, gam...

Please sign up or login with your details

Forgot password? Click here to reset