Non-Adversarial Imitation Learning and its Connections to Adversarial Methods

08/08/2020
by   Oleg Arenz, et al.
0

Many modern methods for imitation learning and inverse reinforcement learning, such as GAIL or AIRL, are based on an adversarial formulation. These methods apply GANs to match the expert's distribution over states and actions with the implicit state-action distribution induced by the agent's policy. However, by framing imitation learning as a saddle point problem, adversarial methods can suffer from unstable optimization, and convergence can only be shown for small policy updates. We address these problems by proposing a framework for non-adversarial imitation learning. The resulting algorithms are similar to their adversarial counterparts and, thus, provide insights for adversarial imitation learning methods. Most notably, we show that AIRL is an instance of our non-adversarial formulation, which enables us to greatly simplify its derivations and obtain stronger convergence guarantees. We also show that our non-adversarial formulation can be used to derive novel algorithms by presenting a method for offline imitation learning that is inspired by the recent ValueDice algorithm, but does not rely on small policy updates for convergence. In our simulated robot experiments, our offline method for non-adversarial imitation learning seems to perform best when using many updates for policy and discriminator at each iteration and outperforms behavioral cloning and ValueDice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2020

Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

Adversarial imitation learning alternates between learning a discriminat...
research
08/04/2021

A Pragmatic Look at Deep Imitation Learning

The introduction of the generative adversarial imitation learning (GAIL)...
research
10/05/2021

A Critique of Strictly Batch Imitation Learning

Recent work by Jarrett et al. attempts to frame the problem of offline i...
research
11/02/2020

NEARL: Non-Explicit Action Reinforcement Learning for Robotic Control

Traditionally, reinforcement learning methods predict the next action ba...
research
12/02/2021

Quantile Filtered Imitation Learning

We introduce quantile filtered imitation learning (QFIL), a novel policy...
research
09/22/2022

Proximal Point Imitation Learning

This work develops new algorithms with rigorous efficiency guarantees fo...
research
02/02/2022

Causal Imitation Learning under Temporally Correlated Noise

We develop algorithms for imitation learning from policy data that was c...

Please sign up or login with your details

Forgot password? Click here to reset