Reparameterized Variational Divergence Minimization for Stable Imitation

by   Dilip Arumugam, et al.

While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories only contain expert observations, have not been met with the same success. Inspired by recent investigations of f-divergence manipulation for the standard imitation learning setting(Ke et al., 2019; Ghasemipour et al., 2019), we here examine the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We unfortunately find that f-divergence minimization through reinforcement learning is susceptible to numerical instabilities. We contribute a reparameterization trick for adversarial imitation learning to alleviate the optimization challenges of the promising f-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.


RAIL: A modular framework for Reinforcement-learning-based Adversarial Imitation Learning

While Adversarial Imitation Learning (AIL) algorithms have recently led ...

A Divergence Minimization Perspective on Imitation Learning Methods

In many settings, it is desirable to learn decision-making and control p...

Imitation Learning by State-Only Distribution Matching

Imitation Learning from observation describes policy learning in a simil...

Risk-Sensitive Generative Adversarial Imitation Learning

We study risk-sensitive imitation learning where the agent's goal is to ...

Imitation Learning as f-Divergence Minimization

We address the problem of imitation learning with multi-modal demonstrat...

Modeling Strong and Human-Like Gameplay with KL-Regularized Search

We consider the task of building strong but human-like policies in multi...

Neuroprosthetic decoder training as imitation learning

Neuroprosthetic brain-computer interfaces function via an algorithm whic...