A Divergence Minimization Perspective on Imitation Learning Methods

In many settings, it is desirable to learn decision-making and control policies through learning or bootstrapping from expert demonstrations. The most common approaches under this Imitation Learning (IL) framework are Behavioural Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, due to multiple factors of variation, directly comparing these methods does not provide adequate intuition for understanding this difference in performance. In this work, we present a unified probabilistic perspective on IL algorithms based on divergence minimization. We present f-MAX, an f-divergence generalization of AIRL [Fu et al., 2018], a state-of-the-art IRL method. f-MAX enables us to relate prior IRL methods such as GAIL [Ho Ermon, 2016] and AIRL [Fu et al., 2018], and understand their algorithmic properties. Through the lens of divergence minimization we tease apart the differences between BC and successful IRL approaches, and empirically evaluate these nuances on simulated high-dimensional continuous control domains. Our findings conclusively identify that IRL's state-marginal matching objective contributes most to its superior performance. Lastly, we apply our new understanding of IL methods to the problem of state-marginal matching, where we demonstrate that in simulated arm pushing environments we can teach agents a diverse range of behaviours using simply hand-specified state distributions and no reward functions or expert demonstrations. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/fmax_paper.md .

READ FULL TEXT

page 8

page 19

research
06/18/2020

Reparameterized Variational Divergence Minimization for Stable Imitation

While recent state-of-the-art results for adversarial imitation-learning...
research
09/15/2019

VILD: Variational Imitation Learning with Diverse-quality Demonstrations

The goal of imitation learning (IL) is to learn a good policy from high-...
research
07/20/2023

On Combining Expert Demonstrations in Imitation Learning via Optimal Transport

Imitation learning (IL) seeks to teach agents specific tasks through exp...
research
06/30/2022

Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Imitation learning holds tremendous promise in learning policies efficie...
research
05/30/2019

Imitation Learning as f-Divergence Minimization

We address the problem of imitation learning with multi-modal demonstrat...
research
03/01/2023

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Recent methods for imitation learning directly learn a Q-function using ...
research
03/06/2022

MIRROR: Differentiable Deep Social Projection for Assistive Human-Robot Communication

Communication is a hallmark of intelligence. In this work, we present MI...

Please sign up or login with your details

Forgot password? Click here to reset