Remember and Forget for Experience Replay

07/16/2018
by   Guido Novati, et al.
0

Experience replay (ER) is crucial for attaining high data-efficiency in off-policy deep reinforcement learning (RL). ER entails the recall of experiences obtained in past iterations to compute gradient estimates for the current policy. However, the accuracy of such updates may deteriorate when the policy diverges from past behaviors. Remedies that aim to abate policy changes, such as target networks and hyper-parameter tuning, do not prevent the policy from becoming disconnected from past experiences, possibly undermining the effectiveness of ER. We introduce an algorithm that relies on systematic Remembering and Forgetting for ER (ReF-ER). In ReF-ER the RL agents forget experiences that would be too unlikely with the current policy and constrain policy changes within a trust region of past behaviors in the replay memory. We show that ReF-ER improves the reliability and performance of off-policy RL, both in the deterministic and in the stochastic policy gradients settings. Finally, we complement ReF-ER with a novel off-policy actor-critic algorithm (RACER) for continuous-action control problems. RACER employs a computationally efficient closed-form approximation of on-policy action values and is shown to be highly competitive with state-of-the-art algorithms on benchmark problems, while being robust to large hyper-parameter variations.

READ FULL TEXT

page 15

page 16

page 17

research
09/29/2020

Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy

Experience replay (ER) improves the data efficiency of off-policy reinfo...
research
06/10/2019

Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past

Soft Actor-Critic (SAC) is an off-policy actor-critic deep reinforcement...
research
03/04/2021

Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings

Recent advances in off-policy deep reinforcement learning (RL) have led ...
research
09/25/2019

Off-Policy Actor-Critic with Shared Experience Replay

We investigate the combination of actor-critic reinforcement learning al...
research
05/05/2020

Discrete-to-Deep Supervised Policy Learning

Neural networks are effective function approximators, but hard to train ...
research
06/23/2020

Experience Replay with Likelihood-free Importance Weights

The use of past experiences to accelerate temporal difference (TD) learn...
research
06/05/2023

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

Learning high-quality Q-value functions plays a key role in the success ...

Please sign up or login with your details

Forgot password? Click here to reset