Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

06/23/2020
by   Paul Barde, et al.
15

Adversarial imitation learning alternates between learning a discriminator – which tells apart expert's demonstrations from generated ones – and a generator's policy to produce trajectories that can fool this discriminator. This alternated optimization is known to be delicate in practice since it compounds unstable adversarial training with brittle and sample-inefficient reinforcement learning. We propose to remove the burden of the policy optimization steps by leveraging a novel discriminator formulation. Specifically, our discriminator is explicitly conditioned on two policies: the one from the previous generator's iteration and a learnable policy. When optimized, this discriminator directly learns the optimal generator's policy. Consequently, our discriminator's update solves the generator's optimization problem for free: learning a policy that imitates the expert does not require an additional optimization loop. This formulation effectively cuts by half the implementation and computational burden of adversarial imitation learning algorithms by removing the reinforcement learning phase altogether. We show on a variety of tasks that our simpler approach is competitive to prevalent imitation learning methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2020

Non-Adversarial Imitation Learning and its Connections to Adversarial Methods

Many modern methods for imitation learning and inverse reinforcement lea...
research
05/29/2019

Adversarial Imitation Learning from Incomplete Demonstrations

Imitation learning targets deriving a mapping from states to actions, a....
research
10/01/2018

Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

Adversarial learning methods have been proposed for a wide range of appl...
research
01/18/2023

DIRECT: Learning from Sparse and Shifting Rewards using Discriminative Reward Co-Training

We propose discriminative reward co-training (DIRECT) as an extension to...
research
02/02/2020

Combating False Negatives in Adversarial Imitation Learning

In adversarial imitation learning, a discriminator is trained to differe...
research
06/30/2021

Robust Generative Adversarial Imitation Learning via Local Lipschitzness

We explore methodologies to improve the robustness of generative adversa...
research
08/17/2023

Regularizing Adversarial Imitation Learning Using Causal Invariance

Imitation learning methods are used to infer a policy in a Markov decisi...

Please sign up or login with your details

Forgot password? Click here to reset