A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

09/29/2021
by   Sihan Zeng, et al.
0

We study a novel two-time-scale stochastic gradient method for solving optimization problems where the gradient samples are generated from a time-varying Markov random process parameterized by the underlying optimization variable. These time-varying samples make the stochastic gradient biased and dependent, which can potentially lead to the divergence of the iterates. To address this issue, we consider a two-time-scale update scheme, where one scale is used to estimate the true gradient from the Markovian samples and the other scale is used to update the decision variable with the estimated gradient. While these two iterates are implemented simultaneously, the former is updated "faster" (using bigger step sizes) than the latter (using smaller step sizes). Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different objective functions, namely, strong convexity, convexity, non-convexity under the PL condition, and general non-convexity. Our second contribution is to apply our framework to study the performance of the popular actor-critic methods in solving stochastic control and reinforcement learning problems. First, we study an online natural actor-critic algorithm for the linear-quadratic regulator and show that a convergence rate of 𝒪(k^-2/3) is achieved. This is the first time such a result is known in the literature. Second, we look at the standard online actor-critic algorithm over finite state and action spaces and derive a convergence rate of 𝒪(k^-2/5), which recovers the best known rate derived specifically for this problem. Finally, we support our theoretical analysis with numerical simulations where the convergence rate is visualized.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2021

Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes

We consider a discounted cost constrained Markov decision process (CMDP)...
research
08/19/2021

Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning

Actor-critic algorithms are widely used in reinforcement learning, but a...
research
10/18/2019

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Prob...
research
01/31/2022

Single Time-scale Actor-critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

We propose a single time-scale actor-critic algorithm to solve the linea...
research
12/23/2019

Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation

Motivated by their broad applications in reinforcement learning, we stud...
research
02/28/2022

Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation

We study the convergence of the actor-critic algorithm with nonlinear fu...
research
04/28/2021

FastAdaBelief: Improving Convergence Rate for Belief-based Adaptive Optimizer by Strong Convexity

The AdaBelief algorithm demonstrates superior generalization ability to ...

Please sign up or login with your details

Forgot password? Click here to reset