Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

10/01/2018
by   Xue Bin Peng, et al.
20

Adversarial learning methods have been proposed for a wide range of applications, but the training of adversarial models can be notoriously unstable. Effectively balancing the performance of the generator and discriminator is critical, since a discriminator that achieves very high accuracy will produce relatively uninformative gradients. In this work, we propose a simple and general technique to constrain information flow in the discriminator by means of an information bottleneck. By enforcing a constraint on the mutual information between the observations and the discriminator's internal representation, we can effectively modulate the discriminator's accuracy and maintain useful and informative gradients. We demonstrate that our proposed variational discriminator bottleneck (VDB) leads to significant improvements across three distinct application areas for adversarial learning algorithms. Our primary evaluation studies the applicability of the VDB to imitation learning of dynamic continuous control skills, such as running. We show that our method can learn such skills directly from raw video demonstrations, substantially outperforming prior adversarial imitation learning methods. The VDB can also be combined with adversarial inverse reinforcement learning to learn parsimonious reward functions that can be transferred and re-optimized in new settings. Finally, we demonstrate that VDB can train GANs more effectively for image generation, improving upon a number of prior stabilization methods.

READ FULL TEXT

page 8

page 10

page 20

page 21

page 23

page 24

research
06/23/2020

Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

Adversarial imitation learning alternates between learning a discriminat...
research
12/09/2018

Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

The performance of adversarial dialogue generation models relies on the ...
research
06/30/2021

Robust Generative Adversarial Imitation Learning via Local Lipschitzness

We explore methodologies to improve the robustness of generative adversa...
research
03/08/2021

Domain-Robust Visual Imitation Learning with Mutual Information Constraints

Human beings are able to understand objectives and learn by simply obser...
research
02/02/2020

Combating False Negatives in Adversarial Imitation Learning

In adversarial imitation learning, a discriminator is trained to differe...
research
06/11/2021

To Beam Or Not To Beam: That is a Question of Cooperation for Language GANs

Due to the discrete nature of words, language GANs require to be optimiz...

Please sign up or login with your details

Forgot password? Click here to reset