Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

05/07/2020
by   Tengyu Xu, et al.
0

As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first nested-loop design, actor's one update of policy is followed by an entire loop of critic's updates of the value function, and the finite-sample analysis of such AC and NAC algorithms have been recently well established. The second two time-scale design, in which actor and critic update simultaneously but with different learning rates, has much fewer tuning parameters than the nested-loop design and is hence substantially easier to implement. Although two time-scale AC and NAC have been shown to converge in the literature, the finite-sample convergence rate has not been established. In this paper, we provide the first such non-asymptotic convergence rate for two time-scale AC and NAC under Markovian sampling and with actor having general policy class approximation. We show that two time-scale AC requires the overall sample complexity at the order of O(ϵ^-2.5log^3(ϵ^-1)) to attain an ϵ-accurate stationary point, and two time-scale NAC requires the overall sample complexity at the order of O(ϵ^-4log^2(ϵ^-1)) to attain an ϵ-accurate global optimal point. We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling and for analyzing the convergence rate of the linear critic with dynamically changing base functions and transition kernel.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2020

Improving Sample Complexity Bounds for Actor-Critic Algorithms

The actor-critic (AC) algorithm is a popular method to find an optimal p...
research
05/04/2020

A Finite Time Analysis of Two Time-Scale Actor Critic Methods

Actor-critic (AC) methods have exhibited great empirical success compare...
research
02/23/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Designing off-policy reinforcement learning algorithms is typically a ve...
research
06/02/2021

On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction

In this paper, we study the convergence properties of off-policy policy ...
research
12/31/2020

Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup

Asynchronous and parallel implementation of standard reinforcement learn...
research
08/18/2022

Global Convergence of Two-timescale Actor-Critic for Solving Linear Quadratic Regulator

The actor-critic (AC) reinforcement learning algorithms have been the po...
research
11/04/2021

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

In this paper, we establish the global optimality and convergence rate o...

Please sign up or login with your details

Forgot password? Click here to reset