On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

11/12/2014
by   Nathaniel Korda, et al.
0

We provide non-asymptotic bounds for the well-known temporal difference learning algorithm TD(0) with linear function approximators. These include high-probability bounds as well as bounds in expectation. Our analysis suggests that a step-size inversely proportional to the number of iterations cannot guarantee optimal rate of convergence unless we assume (partial) knowledge of the stationary distribution for the Markov chain underlying the policy considered. We also provide bounds for the iterate averaged TD(0) variant, which gets rid of the step-size dependency while exhibiting the optimal rate of convergence. Furthermore, we propose a variant of TD(0) with linear approximators that incorporates a centering sequence, and establish that it exhibits an exponential rate of convergence in expectation. We demonstrate the usefulness of our bounds on two synthetic experimental settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2022

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

We study the finite-time behaviour of the popular temporal difference (T...
research
02/20/2020

Adaptive Temporal Difference Learning with Linear Function Approximation

This paper revisits the celebrated temporal difference (TD) learning alg...
research
07/10/2022

Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation

This paper provides a finite-time analysis of linear stochastic approxim...
research
06/11/2013

Stochastic approximation for speeding up LSTD (and LSPI)

We propose a stochastic approximation (SA) based method with randomizati...
research
12/29/2020

Fast Incremental Expectation Maximization for finite-sum optimization: nonasymptotic convergence

Fast Incremental Expectation Maximization (FIEM) is a version of the EM ...
research
10/27/2021

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

The paper concerns convergence and asymptotic statistics for stochastic ...
research
06/18/2014

A Generalized Markov-Chain Modelling Approach to (1,λ)-ES Linear Optimization: Technical Report

Several recent publications investigated Markov-chain modelling of linea...

Please sign up or login with your details

Forgot password? Click here to reset