A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

06/06/2018
by   Jalaj Bhandari, et al.
0

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement learning, its theoretical analysis has proved challenging and few guarantees on its statistical efficiency are available. In this work, we provide a simple and explicit finite time analysis of temporal difference learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. A final section of the paper shows that all of our main results extend to Q-learning applied to high dimensional optimal stopping problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2021

Control Theoretic Analysis of Temporal Difference Learning

The goal of this paper is to investigate a control theoretic analysis of...
research
10/27/2020

Temporal Difference Learning as Gradient Splitting

Temporal difference learning with linear function approximation is a pop...
research
09/09/2021

Versions of Gradient Temporal Difference Learning

Sutton, Szepesvári and Maei introduced the first gradient temporal-diffe...
research
11/06/2018

Online Off-policy Prediction

This paper investigates the problem of online prediction learning, where...
research
04/22/2022

Analysis of Temporal Difference Learning: Linear System Approach

The goal of this technical note is to introduce a new finite-time conver...
research
11/19/2010

Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view

We investigate projection methods, for evaluating a linear approximation...
research
10/24/2020

An Adiabatic Theorem for Policy Tracking with TD-learning

We evaluate the ability of temporal difference learning to track the rew...

Please sign up or login with your details

Forgot password? Click here to reset