Temporal Difference Learning with Experience Replay

06/16/2023
by   Han-Dong Lim, et al.
0

Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL). Despite its widespread use, it has only been recently that researchers have begun to actively study its finite time behavior, including the finite time bound on mean squared error and sample complexity. On the empirical side, experience replay has been a key ingredient in the success of deep RL algorithms, but its theoretical effects on RL have yet to be fully understood. In this paper, we present a simple decomposition of the Markovian noise terms and provide finite-time error bounds for TD-learning with experience replay. Specifically, under the Markovian observation model, we demonstrate that for both the averaged iterate and final iterate cases, the error term induced by a constant step-size can be effectively controlled by the size of the replay buffer and the mini-batch sampled from the experience replay buffer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2017

A Deeper Look at Experience Replay

Experience replay plays an important role in the success of deep reinfor...
research
06/26/2022

Analysis of Stochastic Processes through Replay Buffers

Replay buffers are a key component in many reinforcement learning scheme...
research
09/12/2017

Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging

We consider d-dimensional linear stochastic approximation algorithms (LS...
research
06/24/2019

Optimal Use of Experience in First Person Shooter Environments

Although reinforcement learning has made great strides recently, a conti...
research
06/07/2022

Look Back When Surprised: Stabilizing Reverse Experience Replay for Neural Approximation

Experience replay methods, which are an essential part of reinforcement ...
research
11/01/2022

Event Tables for Efficient Experience Replay

Experience replay (ER) is a crucial component of many deep reinforcement...
research
07/12/2021

Learning Expected Emphatic Traces for Deep RL

Off-policy sampling and experience replay are key for improving sample e...

Please sign up or login with your details

Forgot password? Click here to reset