Stochastic approximation for speeding up LSTD (and LSPI)

06/11/2013
by   L. A. Prashanth, et al.
0

We propose a stochastic approximation (SA) based method with randomization of samples for policy evaluation using the least squares temporal difference (LSTD) algorithm. Our method results in an O(d) improvement in complexity in comparison to regular LSTD, where d is the dimension of the data. We provide convergence rate results for our proposed method, both in high probability and in expectation. Moreover, we also establish that using our scheme in place of LSTD does not impact the rate of convergence of the approximate value function to the true value function and hence a low-complexity LSPI variant that uses our SA based scheme has the same order of the performance bounds as that of regular LSPI. These rate results coupled with the low complexity of our method make it attractive for implementation in big data settings, where d is large. Furthermore, we analyze a similar low-complexity alternative for least squares regression and provide finite-time bounds there. We demonstrate the practicality of our method for LSTD empirically by combining it with the LSPI algorithm in a traffic signal control application. We also conduct another set of experiments that combines the SA based low-complexity variant for least squares regression with the LinUCB algorithm for contextual bandits, using the large scale news recommendation dataset from Yahoo.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2013

Fast gradient descent for drifting least squares regression, with application to bandits

Online learning algorithms require to often recompute least squares regr...
research
05/13/2014

Rate of Convergence and Error Bounds for LSTD(λ)

We consider LSTD(λ), the least-squares temporal-difference algorithm wit...
research
11/12/2014

On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

We provide non-asymptotic bounds for the well-known temporal difference ...
research
09/28/2021

The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation

When the sizes of the state and action spaces are large, solving MDPs ca...
research
07/23/2012

Bellman Error Based Feature Generation using Random Projections on Sparse Spaces

We address the problem of automatic generation of features for value fun...
research
10/20/2022

Krylov-Bellman boosting: Super-linear policy evaluation in general state spaces

We present and analyze the Krylov-Bellman Boosting (KBB) algorithm for p...
research
08/21/2020

Low-complexity Architecture for AR(1) Inference

In this Letter, we propose a low-complexity estimator for the correlatio...

Please sign up or login with your details

Forgot password? Click here to reset