Self-Imitation Advantage Learning

12/22/2020
by   Johan Ferret, et al.
0

Self-imitation learning is a Reinforcement Learning (RL) method that encourages actions whose returns were higher than expected, which helps in hard exploration and sparse reward problems. It was shown to improve the performance of on-policy actor-critic methods in several discrete control tasks. Nevertheless, applying self-imitation to the mostly action-value based off-policy RL methods is not straightforward. We propose SAIL, a novel generalization of self-imitation learning for off-policy RL, based on a modification of the Bellman optimality operator that we connect to Advantage Learning. Crucially, our method mitigates the problem of stale returns by choosing the most optimistic return estimate between the observed return and the current action-value for self-imitation. We demonstrate the empirical effectiveness of SAIL on the Arcade Learning Environment, with a focus on hard exploration games.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2018

Self-Imitation Learning

This paper proposes Self-Imitation Learning (SIL), a simple off-policy a...
research
06/05/2022

ARC – Actor Residual Critic for Adversarial Imitation Learning

Adversarial Imitation Learning (AIL) is a class of popular state-of-the-...
research
02/24/2022

All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

Upside down reinforcement learning (UDRL) flips the conventional use of ...
research
01/17/2019

Amplifying the Imitation Effect for Reinforcement Learning of UCAV's Mission Execution

This paper proposes a new reinforcement learning (RL) algorithm that enh...
research
12/07/2021

JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

Learning rational behaviors in open-world games like Minecraft remains t...
research
02/10/2021

Learning Equational Theorem Proving

We develop Stratified Shortest Solution Imitation Learning (3SIL) to lea...
research
11/03/2021

Smooth Imitation Learning via Smooth Costs and Smooth Policies

Imitation learning (IL) is a popular approach in the continuous control ...

Please sign up or login with your details

Forgot password? Click here to reset