Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging

We consider d-dimensional linear stochastic approximation algorithms (LSAs) with a constant step-size and the so called Polyak-Ruppert (PR) averaging of iterates. LSAs are widely applied in machine learning and reinforcement learning (RL), where the aim is to compute an appropriate θ_*∈R^d (that is an optimum or a fixed point) using noisy data and O(d) updates per iteration. In this paper, we are motivated by the problem (in RL) of policy evaluation from experience replay using the temporal difference (TD) class of learning algorithms that are also LSAs. For LSAs with a constant step-size, and PR averaging, we provide bounds for the mean squared error (MSE) after t iterations. We assume that data is with finite variance (underlying distribution being P) and that the expected dynamics is Hurwitz. For a given LSA with PR averaging, and data distribution P satisfying the said assumptions, we show that there exists a range of constant step-sizes such that its MSE decays as O(1/t). We examine the conditions under which a constant step-size can be chosen uniformly for a class of data distributions P, and show that not all data distributions `admit' such a uniform constant step-size. We also suggest a heuristic step-size tuning algorithm to choose a constant step-size of a given LSA for a given data distribution P. We compare our results with related work and also discuss the implication of our results in the context of TD algorithms that are LSAs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2023

Temporal Difference Learning with Experience Replay

Temporal-difference (TD) learning is widely regarded as one of the most ...
research
04/03/2018

A Constant Step Stochastic Douglas-Rachford Algorithm with Application to Non Separable Regularizations

The Douglas Rachford algorithm is an algorithm that converges to a minim...
research
11/29/2014

Constant Step Size Least-Mean-Square: Bias-Variance Trade-offs and Optimal Sampling Distributions

We consider the least-squares regression problem and provide a detailed ...
research
06/30/2021

On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging

We study the stochastic bilinear minimax optimization problem, presentin...
research
10/12/2022

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

We study the finite-time behaviour of the popular temporal difference (T...
research
06/18/2014

A Generalized Markov-Chain Modelling Approach to (1,λ)-ES Linear Optimization: Technical Report

Several recent publications investigated Markov-chain modelling of linea...
research
08/25/2019

Almost Tune-Free Variance Reduction

The variance reduction class of algorithms including the representative ...

Please sign up or login with your details

Forgot password? Click here to reset