Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations

06/17/2020
by   Alexey Skrynnik, et al.
0

Currently, deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. Often these results are achieved at the expense of huge computational costs and require an incredible number of episodes of interaction between the agent and the environment. There are two main approaches to improving the sample efficiency of reinforcement learning methods - using hierarchical methods and expert demonstrations. In this paper, we propose a combination of these approaches that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our forgetful experience replay (ForgER) algorithm effectively handles errors in expert data and reduces quality losses when adapting the action space and states representation to the agent's capabilities. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method is universal and can be integrated into various off-policy methods. It surpasses all known existing state-of-the-art RL methods using expert demonstrations on various model environments. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.

READ FULL TEXT
research
03/21/2022

Self-Imitation Learning from Demonstrations

Despite the numerous breakthroughs achieved with Reinforcement Learning ...
research
12/18/2019

Hierarchical Deep Q-Network with Forgetting from Imperfect Demonstrations in Minecraft

We present hierarchical Deep Q-Network with Forgetting (HDQF) that took ...
research
10/21/2019

Self-Educated Language Agent With Hindsight Experience Replay For Instruction Following

Language creates a compact representation of the world and allows the de...
research
12/04/2019

Learning from Interventions using Hierarchical Policies for Safe Learning

Learning from Demonstrations (LfD) via Behavior Cloning (BC) works well ...
research
02/20/2019

Curiosity-Driven Experience Prioritization via Density Estimation

In Reinforcement Learning (RL), an agent explores the environment and co...
research
10/05/2020

Policy Learning Using Weak Supervision

Most existing policy learning solutions require the learning agents to r...
research
01/11/2022

Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

During recent years, deep reinforcement learning (DRL) has made successf...

Please sign up or login with your details

Forgot password? Click here to reset