Temporal Difference Learning as Gradient Splitting

10/27/2020
by   Rui Liu, et al.
0

Temporal difference learning with linear function approximation is a popular method to obtain a low-dimensional approximation of the value function of a policy in a Markov Decision Process. We give a new interpretation of this method in terms of a splitting of the gradient of an appropriately chosen function. As a consequence of this interpretation, convergence proofs for gradient descent can be applied almost verbatim to temporal difference learning. Beyond giving a new, fuller explanation of why temporal difference works, our interpretation also yields improved convergence times. We consider the setting with 1/√(T) step-size, where previous comparable finite-time convergence time bounds for temporal difference learning had the multiplicative factor 1/(1-γ) in front of the bound, with γ being the discount factor. We show that a minor variation on TD learning which estimates the mean of the value function separately has a convergence time where 1/(1-γ) only multiplies an asymptotically negligible term.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2018

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used t...
research
11/29/2022

Closing the gap between SVRG and TD-SVRG with Gradient Splitting

Temporal difference (TD) learning is a simple algorithm for policy evalu...
research
11/19/2010

Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view

We investigate projection methods, for evaluating a linear approximation...
research
05/13/2014

Rate of Convergence and Error Bounds for LSTD(λ)

We consider LSTD(λ), the least-squares temporal-difference algorithm wit...
research
02/25/2023

Provably Efficient Gauss-Newton Temporal Difference Learning Method with Function Approximation

In this paper, based on the spirit of Fitted Q-Iteration (FQI), we propo...
research
03/13/2023

n-Step Temporal Difference Learning with Optimal n

We consider the problem of finding the optimal value of n in the n-step ...
research
10/05/2016

ℓ_1 Regularized Gradient Temporal-Difference Learning

In this paper, we study the Temporal Difference (TD) learning with linea...

Please sign up or login with your details

Forgot password? Click here to reset