Self-Imitation Learning in Sparse Reward Settings

10/14/2020
by   Zhixin Chen, et al.
0

The application of reinforcement learning (RL) in real-world is still limited in the environments with sparse and delayed rewards. Self-imitation learning (SIL) is developed as an auxiliary component of RL to relieve the problem by encouraging the agents to imitate their historical best behaviors. In this paper, we propose a practical SIL algorithm named Self-Imitation Learning with Constant Reward (SILCR). Instead of requiring hand-defined immediate rewards from environments, our algorithm assigns the immediate rewards at each timestep with constant values according to their final episodic rewards. In this way, even if the dense rewards from environments are unavailable, every action taken by the agents would be guided properly. We demonstrate the effectiveness of our method in some challenging MuJoCo locomotion tasks and the results show that our method significantly outperforms the alternative methods in tasks with delayed and sparse rewards. Even compared with alternatives with dense rewards available, our method achieves competitive performance. The ablation experiments also show the stability and reproducibility of our method.

READ FULL TEXT
research
11/26/2020

Episodic Self-Imitation Learning with Hindsight

Episodic self-imitation learning, a novel self-imitation algorithm with ...
research
01/20/2020

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Reinforcement learning (RL) has been widely studied for improving sequen...
research
02/10/2021

Learning Equational Theorem Proving

We develop Stratified Shortest Solution Imitation Learning (3SIL) to lea...
research
05/20/2019

Perceptual Values from Observation

Imitation by observation is an approach for learning from expert demonst...
research
02/02/2023

Visual Imitation Learning with Patch Rewards

Visual imitation learning enables reinforcement learning agents to learn...
research
12/14/2021

Learning to Guide and to Be Guided in the Architect-Builder Problem

We are interested in interactive agents that learn to coordinate, namely...
research
05/20/2022

Learning Dense Reward with Temporal Variant Self-Supervision

Rewards play an essential role in reinforcement learning. In contrast to...

Please sign up or login with your details

Forgot password? Click here to reset