Hindsight Experience Replay with Kronecker Product Approximate Curvature

10/09/2020
by   Dhuruva Priyan G M, et al.
0

Hindsight Experience Replay (HER) is one of the efficient algorithm to solve Reinforcement Learning tasks related to sparse rewarded environments.But due to its reduced sample efficiency and slower convergence HER fails to perform effectively. Natural gradients solves these challenges by converging the model parameters better. It avoids taking bad actions that collapse the training performance. However updating parameters in neural networks requires expensive computation and thus increase in training time. Our proposed method solves the above mentioned challenges with better sample efficiency and faster convergence with increased success rate. A common failure mode for DDPG is that the learned Q-function begins to dramatically overestimate Q-values, which then leads to the policy breaking, because it exploits the errors in the Q-function. We solve this issue by including Twin Delayed Deep Deterministic Policy Gradients(TD3) in HER. TD3 learns two Q-functions instead of one and it adds noise tothe target action, to make it harder for the policy to exploit Q-function errors. The experiments are done with the help of OpenAis Mujoco environments. Results on these environments show that our algorithm (TDHER+KFAC) performs better inmost of the scenarios

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2021

Bias-reduced multi-step hindsight experience replay

Multi-goal reinforcement learning is widely used in planning and robot m...
research
03/04/2020

Dynamic Experience Replay

We present a novel technique called Dynamic Experience Replay (DER) that...
research
06/28/2023

RoMo-HER: Robust Model-based Hindsight Experience Replay

Sparse rewards are one of the factors leading to low sample efficiency i...
research
03/04/2021

Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings

Recent advances in off-policy deep reinforcement learning (RL) have led ...
research
08/28/2020

Sample Efficiency in Sparse Reinforcement Learning: Or Your Money Back

Sparse rewards present a difficult problem in reinforcement learning and...
research
02/09/2019

Meta-Curvature

We propose to learn curvature information for better generalization and ...
research
02/13/2020

XCS Classifier System with Experience Replay

XCS constitutes the most deeply investigated classifier system today. It...

Please sign up or login with your details

Forgot password? Click here to reset