Large Batch Experience Replay

10/04/2021
by   Thibault Lahire, et al.
0

Several algorithms have been proposed to sample non-uniformly the replay buffer of deep Reinforcement Learning (RL) agents to speed-up learning, but very few theoretical foundations of these sampling schemes have been provided. Among others, Prioritized Experience Replay appears as a hyperparameter sensitive heuristic, even though it can provide good performance. In this work, we cast the replay buffer sampling problem as an importance sampling one for estimating the gradient. This allows deriving the theoretically optimal sampling distribution, yielding the best theoretical convergence speed. Elaborating on the knowledge of the ideal sampling scheme, we exhibit new theoretical foundations of Prioritized Experience Replay. The optimal sampling distribution being intractable, we make several approximations providing good results in practice and introduce, among others, LaBER (Large Batch Experience Replay), an easy-to-code and efficient method for sampling the replay buffer. LaBER, which can be combined with Deep Q-Networks, distributional RL agents or actor-critic methods, yields improved performance over a diverse range of Atari games and PyBullet environments, compared to the base agent it is implemented on and to other prioritization schemes.

READ FULL TEXT

page 10

page 11

research
12/04/2017

A Deeper Look at Experience Replay

Experience replay plays an important role in the success of deep reinfor...
research
11/01/2022

Event Tables for Efficient Experience Replay

Experience replay (ER) is a crucial component of many deep reinforcement...
research
12/08/2021

Replay For Safety

Experience replay <cit.> is a widely used technique to achieve efficient...
research
07/12/2021

Learning Expected Emphatic Traces for Deep RL

Off-policy sampling and experience replay are key for improving sample e...
research
07/15/2023

An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets

Reinforcement Learning (RL) algorithms aim to learn an optimal policy by...
research
12/08/2021

Convergence Results For Q-Learning With Experience Replay

A commonly used heuristic in RL is experience replay (e.g. <cit.>), in w...
research
09/26/2022

Paused Agent Replay Refresh

Reinforcement learning algorithms have become more complex since the inv...

Please sign up or login with your details

Forgot password? Click here to reset