Generative Adversarial Self-Imitation Learning

12/03/2018
by   Yijie Guo, et al.
0

This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, which can potentially make long-term credit assignment easier when rewards are sparse and delayed. GASIL can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function. Our experimental results show that GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics.

READ FULL TEXT
research
06/24/2020

When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence

Generative adversarial imitation learning (GAIL) is a popular inverse re...
research
08/15/2023

Generating Personas for Games with Multimodal Adversarial Imitation Learning

Reinforcement learning has been widely successful in producing agents ca...
research
06/28/2020

Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning

Despite the recent success of reinforcement learning in various domains,...
research
01/18/2023

DIRECT: Learning from Sparse and Shifting Rewards using Discriminative Reward Co-Training

We propose discriminative reward co-training (DIRECT) as an extension to...
research
08/04/2021

Tolerance-Guided Policy Learning for Adaptable and Transferrable Delicate Industrial Insertion

Policy learning for delicate industrial insertion tasks (e.g., PC board ...
research
01/09/2020

On Computation and Generalization of Generative Adversarial Imitation Learning

Generative Adversarial Imitation Learning (GAIL) is a powerful and pract...
research
12/18/2019

Relational Mimic for Visual Adversarial Imitation Learning

In this work, we introduce a new method for imitation learning from vide...

Please sign up or login with your details

Forgot password? Click here to reset