A Finite Time Analysis of Two Time-Scale Actor Critic Methods

05/04/2020
by   Yue Wu, et al.
5

Actor-critic (AC) methods have exhibited great empirical success compared with other reinforcement learning algorithms, where the actor uses the policy gradient to improve the learning policy and the critic uses temporal difference learning to estimate the policy gradient. Under the two time-scale learning rate schedule, the asymptotic convergence of AC has been well studied in the literature. However, the non-asymptotic convergence and finite sample complexity of actor-critic methods are largely open. In this work, we provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point (i.e., ∇ J(θ)_2^2 <ϵ) of the non-concave performance function J(θ), with Õ(ϵ^-2.5) sample complexity. To the best of our knowledge, this is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2020

Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

As an important type of reinforcement learning algorithms, actor-critic ...
research
01/31/2022

Single Time-scale Actor-critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

We propose a single time-scale actor-critic algorithm to solve the linea...
research
05/26/2021

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

In this paper, we develop a novel variant of off-policy natural actor-cr...
research
05/20/2023

Off-Policy Average Reward Actor-Critic with Deterministic Policy Search

The average reward criterion is relatively less studied as most existing...
research
10/18/2019

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Prob...
research
06/13/2021

Characterizing the Gap Between Actor-Critic and Policy Gradient

Actor-critic (AC) methods are ubiquitous in reinforcement learning. Alth...
research
01/29/2022

Zeroth-Order Actor-Critic

Zeroth-order optimization methods and policy gradient based first-order ...

Please sign up or login with your details

Forgot password? Click here to reset