Reparameterized Variational Divergence Minimization for Stable Imitation

06/18/2020
by   Dilip Arumugam, et al.
0

While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories only contain expert observations, have not been met with the same success. Inspired by recent investigations of f-divergence manipulation for the standard imitation learning setting(Ke et al., 2019; Ghasemipour et al., 2019), we here examine the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We unfortunately find that f-divergence minimization through reinforcement learning is susceptible to numerical instabilities. We contribute a reparameterization trick for adversarial imitation learning to alleviate the optimization challenges of the promising f-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.

READ FULL TEXT
research
05/08/2021

RAIL: A modular framework for Reinforcement-learning-based Adversarial Imitation Learning

While Adversarial Imitation Learning (AIL) algorithms have recently led ...
research
11/06/2019

A Divergence Minimization Perspective on Imitation Learning Methods

In many settings, it is desirable to learn decision-making and control p...
research
02/09/2022

Imitation Learning by State-Only Distribution Matching

Imitation Learning from observation describes policy learning in a simil...
research
08/13/2018

Risk-Sensitive Generative Adversarial Imitation Learning

We study risk-sensitive imitation learning where the agent's goal is to ...
research
05/30/2019

Imitation Learning as f-Divergence Minimization

We address the problem of imitation learning with multi-modal demonstrat...
research
10/05/2021

A Critique of Strictly Batch Imitation Learning

Recent work by Jarrett et al. attempts to frame the problem of offline i...
research
08/20/2020

Imitation Learning with Sinkhorn Distances

Imitation learning algorithms have been interpreted as variants of diver...

Please sign up or login with your details

Forgot password? Click here to reset