Rate of Convergence and Error Bounds for LSTD(λ)

05/13/2014
by   Manel Tagorti, et al.
0

We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ∈ (0,1), a high-probability estimate of the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ=0. In particular, our analysis sheds some light on the choice of λ with respect to the quality of the chosen linear space and the number of samples, that complies with simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2012

Value Function Approximation in Zero-Sum Markov Games

This paper investigates value function approximation in the context of z...
research
10/27/2020

Temporal Difference Learning as Gradient Splitting

Temporal difference learning with linear function approximation is a pop...
research
06/11/2013

Stochastic approximation for speeding up LSTD (and LSPI)

We propose a stochastic approximation (SA) based method with randomizati...
research
08/17/2023

Polynomial Bounds for Learning Noisy Optical Physical Unclonable Functions and Connections to Learning With Errors

It is shown that a class of optical physical unclonable functions (PUFs)...
research
09/13/2016

Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting

We derive a new proof to show that the incremental resparsification algo...
research
11/19/2010

Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view

We investigate projection methods, for evaluating a linear approximation...
research
07/13/2020

Reconstruction of Line-Embeddings of Graphons

Consider a random graph process with n vertices corresponding to points ...

Please sign up or login with your details

Forgot password? Click here to reset