The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

10/27/2021
by   Vivek Borkar, et al.
16

The paper concerns convergence and asymptotic statistics for stochastic approximation driven by Markovian noise: θ_n+1= θ_n + α_n + 1 f(θ_n, Φ_n+1) , n≥ 0, in which each θ_n∈^d, {Φ_n } is a Markov chain on a general state space X with stationary distribution π, and f:^d×X→^d. In addition to standard Lipschitz bounds on f, and conditions on the vanishing step-size sequence {α_n}, it is assumed that the associated ODE is globally asymptotically stable with stationary point denoted θ^*, where f̅(θ)=E[f(θ,Φ)] with Φ∼π. Moreover, the ODE@∞ defined with respect to the vector field, f̅_∞(θ):= lim_r→∞ r^-1f̅(rθ) , θ∈^d, is asymptotically stable. The main contributions are summarized as follows: (i) The sequence θ is convergent if Φ is geometrically ergodic, and subject to compatible bounds on f. The remaining results are established under a stronger assumption on the Markov chain: A slightly weaker version of the Donsker-Varadhan Lyapunov drift condition known as (DV3). (ii) A Lyapunov function is constructed for the joint process {θ_n,Φ_n} that implies convergence of {θ_n} in L_4. (iii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error z_n:= (θ_n-θ^*)/√(α_n). Moment bounds combined with the CLT imply convergence of the normalized covariance, lim_n →∞ E [ z_n z_n^T ] = Σ_θ, where Σ_θ is the asymptotic covariance appearing in the CLT. (iv) An example is provided where the Markov chain Φ is geometrically ergodic but it does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2021

On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

This paper studies the exponential stability of random matrix products d...
research
07/10/2022

Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation

This paper provides a finite-time analysis of linear stochastic approxim...
research
02/15/2021

On Riemannian Stochastic Approximation Schemes with Fixed Step-Size

This paper studies fixed step-size stochastic approximation (SA) schemes...
research
11/12/2014

On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

We provide non-asymptotic bounds for the well-known temporal difference ...
research
09/06/2023

The Curse of Memory in Stochastic Approximation: Extended Version

Theory and application of stochastic approximation (SA) has grown within...
research
09/30/2020

Accelerating Optimization and Reinforcement Learning with Quasi-Stochastic Approximation

The ODE method has been a workhorse for algorithm design and analysis si...
research
07/05/2023

Stability of Q-Learning Through Design and Optimism

Q-learning has become an important part of the reinforcement learning to...

Please sign up or login with your details

Forgot password? Click here to reset