Reparameterized Variational Divergence Minimization for Stable Imitation

06/18/2020
by   Dilip Arumugam, et al.
0

While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories only contain expert observations, have not been met with the same success. Inspired by recent investigations of f-divergence manipulation for the standard imitation learning setting(Ke et al., 2019; Ghasemipour et al., 2019), we here examine the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We unfortunately find that f-divergence minimization through reinforcement learning is susceptible to numerical instabilities. We contribute a reparameterization trick for adversarial imitation learning to alleviate the optimization challenges of the promising f-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.

READ FULL TEXT
05/08/2021

RAIL: A modular framework for Reinforcement-learning-based Adversarial Imitation Learning

While Adversarial Imitation Learning (AIL) algorithms have recently led ...
11/06/2019

A Divergence Minimization Perspective on Imitation Learning Methods

In many settings, it is desirable to learn decision-making and control p...
02/09/2022

Imitation Learning by State-Only Distribution Matching

Imitation Learning from observation describes policy learning in a simil...
08/13/2018

Risk-Sensitive Generative Adversarial Imitation Learning

We study risk-sensitive imitation learning where the agent's goal is to ...
05/30/2019

Imitation Learning as f-Divergence Minimization

We address the problem of imitation learning with multi-modal demonstrat...
12/14/2021

Modeling Strong and Human-Like Gameplay with KL-Regularized Search

We consider the task of building strong but human-like policies in multi...
11/13/2015

Neuroprosthetic decoder training as imitation learning

Neuroprosthetic brain-computer interfaces function via an algorithm whic...