Look Back When Surprised: Stabilizing Reverse Experience Replay for Neural Approximation

06/07/2022
by   Ramnath Kumar, et al.
0

Experience replay methods, which are an essential part of reinforcement learning(RL) algorithms, are designed to mitigate spurious correlations and biases while learning from temporally dependent data. Roughly speaking, these methods allow us to draw batched data from a large buffer such that these temporal correlations do not hinder the performance of descent algorithms. In this experimental work, we consider the recently developed and theoretically rigorous reverse experience replay (RER), which has been shown to remove such spurious biases in simplified theoretical settings. We combine RER with optimistic experience replay (OER) to obtain RER++, which is stable under neural function approximation. We show via experiments that this has a better performance than techniques like prioritized experience replay (PER) on various tasks, with a significantly smaller computational complexity. It is well known in the RL literature that choosing examples greedily with the largest TD error (as in OER) or forming mini-batches with consecutive data points (as in RER) leads to poor performance. However, our method, which combines these techniques, works very well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2017

A Deeper Look at Experience Replay

Experience replay plays an important role in the success of deep reinfor...
research
06/16/2023

Temporal Difference Learning with Experience Replay

Temporal-difference (TD) learning is widely regarded as one of the most ...
research
07/13/2020

Revisiting Fundamentals of Experience Replay

Experience replay is central to off-policy algorithms in deep reinforcem...
research
02/22/2021

Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning

Deep Reinforcement Learning (RL) methods rely on experience replay to ap...
research
03/10/2021

Streaming Linear System Identification with Reverse Experience Replay

We consider the problem of estimating a stochastic linear time-invariant...
research
07/03/2022

Distributed Online System Identification for LTI Systems Using Reverse Experience Replay

Identification of linear time-invariant (LTI) systems plays an important...
research
12/08/2021

Convergence Results For Q-Learning With Experience Replay

A commonly used heuristic in RL is experience replay (e.g. <cit.>), in w...

Please sign up or login with your details

Forgot password? Click here to reset