Bias-reduced multi-step hindsight experience replay

02/25/2021
by   Rui Yang, et al.
1

Multi-goal reinforcement learning is widely used in planning and robot manipulation. Two main challenges in multi-goal reinforcement learning are sparse rewards and sample inefficiency. Hindsight Experience Replay (HER) aims to tackle the two challenges with hindsight knowledge. However, HER and its previous variants still need millions of samples and a huge computation. In this paper, we propose Multi-step Hindsight Experience Replay (MHER) based on n-step relabeling, incorporating multi-step relabeled returns to improve sample efficiency. Despite the advantages of n-step relabeling, we theoretically and experimentally prove the off-policy n-step bias introduced by n-step relabeling may lead to poor performance in many environments. To address the above issue, two bias-reduced MHER algorithms, MHER(λ) and Model-based MHER (MMHER) are presented. MHER(λ) exploits the λ return while MMHER benefits from model-based value expansions. Experimental results on numerous multi-goal robotic tasks show that our solutions can successfully alleviate off-policy n-step bias and achieve significantly higher sample efficiency than HER and Curriculum-guided HER with little additional computation beyond HER.

READ FULL TEXT

page 7

page 8

research
06/28/2023

RoMo-HER: Robust Model-based Hindsight Experience Replay

Sparse rewards are one of the factors leading to low sample efficiency i...
research
10/09/2020

Hindsight Experience Replay with Kronecker Product Approximate Curvature

Hindsight Experience Replay (HER) is one of the efficient algorithm to s...
research
07/01/2021

MHER: Model-based Hindsight Experience Replay

Solving multi-goal reinforcement learning (RL) problems with sparse rewa...
research
10/05/2021

Imaginary Hindsight Experience Replay: Curious Model-based Learning for Sparse Reward Tasks

Model-based reinforcement learning is a promising learning strategy for ...
research
10/31/2018

Towards a Simple Approach to Multi-step Model-based Reinforcement Learning

When environmental interaction is expensive, model-based reinforcement l...
research
09/30/2019

Off-policy Multi-step Q-learning

In the past few years, off-policy reinforcement learning methods have sh...
research
10/07/2022

Elastic Step DQN: A novel multi-step algorithm to alleviate overestimation in Deep QNetworks

Deep Q-Networks algorithm (DQN) was the first reinforcement learning alg...

Please sign up or login with your details

Forgot password? Click here to reset