# The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

The paper concerns convergence and asymptotic statistics for stochastic approximation driven by Markovian noise: θ_n+1= θ_n + α_n + 1 f(θ_n, Φ_n+1) , n≥ 0, in which each θ_n∈^d, {Φ_n } is a Markov chain on a general state space X with stationary distribution π, and f:^d×X→^d. In addition to standard Lipschitz bounds on f, and conditions on the vanishing step-size sequence {α_n}, it is assumed that the associated ODE is globally asymptotically stable with stationary point denoted θ^*, where f̅(θ)=E[f(θ,Φ)] with Φ∼π. Moreover, the ODE@∞ defined with respect to the vector field, f̅_∞(θ):= lim_r→∞ r^-1f̅(rθ) , θ∈^d, is asymptotically stable. The main contributions are summarized as follows: (i) The sequence θ is convergent if Φ is geometrically ergodic, and subject to compatible bounds on f. The remaining results are established under a stronger assumption on the Markov chain: A slightly weaker version of the Donsker-Varadhan Lyapunov drift condition known as (DV3). (ii) A Lyapunov function is constructed for the joint process {θ_n,Φ_n} that implies convergence of {θ_n} in L_4. (iii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error z_n:= (θ_n-θ^*)/√(α_n). Moment bounds combined with the CLT imply convergence of the normalized covariance, lim_n →∞ E [ z_n z_n^T ] = Σ_θ, where Σ_θ is the asymptotic covariance appearing in the CLT. (iv) An example is provided where the Markov chain Φ is geometrically ergodic but it does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded.

## Authors

• 5 publications
• 5 publications
• 2 publications
• 15 publications
• 6 publications
• ### On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

This paper studies the exponential stability of random matrix products d...
01/30/2021 ∙ by Alain Durmus, et al. ∙ 0

• ### On Riemannian Stochastic Approximation Schemes with Fixed Step-Size

This paper studies fixed step-size stochastic approximation (SA) schemes...
02/15/2021 ∙ by Alain Durmus, et al. ∙ 0

• ### On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

We provide non-asymptotic bounds for the well-known temporal difference ...
11/12/2014 ∙ by Nathaniel Korda, et al. ∙ 0

• ### A Multistep Lyapunov Approach for Finite-Time Analysis of Biased Stochastic Approximation

Motivated by the widespread use of temporal-difference (TD-) and Q-learn...
09/10/2019 ∙ by Gang Wang, et al. ∙ 13

• ### Accelerating Optimization and Reinforcement Learning with Quasi-Stochastic Approximation

The ODE method has been a workhorse for algorithm design and analysis si...
09/30/2020 ∙ by Shuhang Chen, et al. ∙ 0

• ### On rank statistics of PageRank and MarkovRank

An important statistic in analyzing some (finite) network data, called P...
08/05/2021 ∙ by Yoichi Nishiyama, et al. ∙ 0