Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment

10/28/2021
by   Tung M. Luu, et al.
0

This paper proposes a method for prioritizing the replay experience referred to as Hindsight Goal Ranking (HGR) in overcoming the limitation of Hindsight Experience Replay (HER) that generates hindsight goals based on uniform sampling. HGR samples with higher probability on the states visited in an episode with larger temporal difference (TD) error, which is considered as a proxy measure of the amount which the RL agent can learn from an experience. The actual sampling for large TD error is performed in two steps: first, an episode is sampled from the relay buffer according to the average TD error of its experiences, and then, for the sampled episode, the hindsight goal leading to larger TD error is sampled with higher probability from future visited states. The proposed method combined with Deep Deterministic Policy Gradient (DDPG), an off-policy model-free actor-critic algorithm, accelerates learning significantly faster than that without any prioritization on four challenging simulated robotic manipulation tasks. The empirical results show that HGR uses samples more efficiently than previous methods across all tasks.

READ FULL TEXT

page 11

page 13

page 14

research
09/01/2022

Actor Prioritized Experience Replay

A widely-studied deep reinforcement learning (RL) technique known as Pri...
research
09/29/2020

Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy

Experience replay (ER) improves the data efficiency of off-policy reinfo...
research
08/17/2021

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay

Hindsight experience replay (HER) is a goal relabelling technique typica...
research
08/31/2022

Cluster-based Sampling in Hindsight Experience Replay for Robot Control

In multi-goal reinforcement learning in an environment, agents learn pol...
research
03/29/2022

Topological Experience Replay

State-of-the-art deep Q-learning methods update Q-values using state tra...
research
05/05/2020

Discrete-to-Deep Supervised Policy Learning

Neural networks are effective function approximators, but hard to train ...
research
07/12/2021

Learning Expected Emphatic Traces for Deep RL

Off-policy sampling and experience replay are key for improving sample e...

Please sign up or login with your details

Forgot password? Click here to reset