Convergence Results For Q-Learning With Experience Replay

12/08/2021
by   Liran Szlak, et al.
0

A commonly used heuristic in RL is experience replay (e.g. <cit.>), in which a learner stores and re-uses past trajectories as if they were sampled online. In this work, we initiate a rigorous study of this heuristic in the setting of tabular Q-learning. We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of replay iterations. We also provide theoretical evidence showing when we might expect this heuristic to strictly improve performance, by introducing and analyzing a simple class of MDPs. Finally, we provide some experiments to support our theoretical findings.

READ FULL TEXT
research
12/08/2021

Replay For Safety

Experience replay <cit.> is a widely used technique to achieve efficient...
research
10/04/2021

Large Batch Experience Replay

Several algorithms have been proposed to sample non-uniformly the replay...
research
10/16/2021

Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs

Q-learning is a popular Reinforcement Learning (RL) algorithm which is w...
research
06/07/2022

Look Back When Surprised: Stabilizing Reverse Experience Replay for Neural Approximation

Experience replay methods, which are an essential part of reinforcement ...
research
05/15/2021

Regret Minimization Experience Replay

Experience replay is widely used in various deep off-policy reinforcemen...
research
08/25/2022

Variance Reduction based Experience Replay for Policy Optimization

For reinforcement learning on complex stochastic systems where many fact...
research
02/25/2021

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

In this work, we study algorithms for learning in infinite-horizon undis...

Please sign up or login with your details

Forgot password? Click here to reset