Finite Sample Analysis of LSTD with Random Projections and Eligibility Traces

05/25/2018
by   Haifang Li, et al.
0

Policy evaluation with linear function approximation is an important problem in reinforcement learning. When facing high-dimensional feature spaces, such a problem becomes extremely hard considering the computation efficiency and quality of approximations. We propose a new algorithm, LSTD(λ)-RP, which leverages random projection techniques and takes eligibility traces into consideration to tackle the above two challenges. We carry out theoretical analysis of LSTD(λ)-RP, and provide meaningful upper bounds of the estimation error, approximation error and total generalization error. These results demonstrate that LSTD(λ)-RP can benefit from random projection and eligibility traces strategies, and LSTD(λ)-RP can achieve better performances than prior LSTD-RP and LSTD(λ) algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/23/2012

Bellman Error Based Feature Generation using Random Projections on Sparse Spaces

We address the problem of automatic generation of features for value fun...
research
04/07/2021

Finite-Sample Analysis for Two Time-scale Non-linear TDC with General Smooth Function Approximation

Temporal-difference learning with gradient correction (TDC) is a two tim...
research
05/25/2017

Convergent Tree-Backup and Retrace with Function Approximation

Off-policy learning is key to scaling up reinforcement learning as it al...
research
11/08/2020

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

This paper provides a statistical analysis of high-dimensional batch Rei...
research
04/08/2023

Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding

Optimal control is notoriously difficult for stochastic nonlinear system...
research
08/11/2021

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Emphatic Temporal Difference (TD) methods are a class of off-policy Rein...
research
07/16/2021

Estimation from Partially Sampled Distributed Traces

Sampling is often a necessary evil to reduce the processing and storage ...

Please sign up or login with your details

Forgot password? Click here to reset