
A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation
Temporal difference learning (TD) is a simple iterative algorithm used t...
read it

Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view
We investigate projection methods, for evaluating a linear approximation...
read it

Rate of Convergence and Error Bounds for LSTD(λ)
We consider LSTD(λ), the leastsquares temporaldifference algorithm wit...
read it

ℓ_1 Regularized Gradient TemporalDifference Learning
In this paper, we study the Temporal Difference (TD) learning with linea...
read it

Approximate Temporal Difference Learning is a Gradient Descent for Reversible Policies
In reinforcement learning, temporal difference (TD) is the most direct a...
read it

An Adiabatic Theorem for Policy Tracking with TDlearning
We evaluate the ability of temporal difference learning to track the rew...
read it

Improper Learning with Gradientbased Policy Optimization
We consider an improper reinforcement learning setting where the learner...
read it
Temporal Difference Learning as Gradient Splitting
Temporal difference learning with linear function approximation is a popular method to obtain a lowdimensional approximation of the value function of a policy in a Markov Decision Process. We give a new interpretation of this method in terms of a splitting of the gradient of an appropriately chosen function. As a consequence of this interpretation, convergence proofs for gradient descent can be applied almost verbatim to temporal difference learning. Beyond giving a new, fuller explanation of why temporal difference works, our interpretation also yields improved convergence times. We consider the setting with 1/√(T) stepsize, where previous comparable finitetime convergence time bounds for temporal difference learning had the multiplicative factor 1/(1γ) in front of the bound, with γ being the discount factor. We show that a minor variation on TD learning which estimates the mean of the value function separately has a convergence time where 1/(1γ) only multiplies an asymptotically negligible term.
READ FULL TEXT
Comments
There are no comments yet.