Self-Imitation Learning

06/14/2018
by   Junhyuk Oh, et al.
0

This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indirectly drive deep exploration. Our empirical results show that SIL significantly improves advantage actor-critic (A2C) on several hard exploration Atari games and is competitive to the state-of-the-art count-based exploration methods. We also show that SIL improves proximal policy optimization (PPO) on MuJoCo tasks.

READ FULL TEXT

page 1

page 5

research
12/22/2020

Self-Imitation Advantage Learning

Self-imitation learning is a Reinforcement Learning (RL) method that enc...
research
06/20/2018

Learning Neural Parsers with Deterministic Differentiable Imitation Learning

We address the problem of spatial segmentation of a 2D object in the con...
research
10/31/2017

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

This paper presents a new method --- adversarial advantage actor-critic ...
research
09/27/2022

Regularized Soft Actor-Critic for Behavior Transfer Learning

Existing imitation learning methods mainly focus on making an agent effe...
research
03/07/2023

A Strategy-Oriented Bayesian Soft Actor-Critic Model

Adopting reasonable strategies is challenging but crucial for an intelli...
research
11/25/2020

Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation

A learning dialogue agent can infer its behaviour from interactions with...
research
08/27/2020

The Advantage Regret-Matching Actor-Critic

Regret minimization has played a key role in online learning, equilibriu...

Please sign up or login with your details

Forgot password? Click here to reset