Improving Sample Complexity Bounds for Actor-Critic Algorithms

04/27/2020
by   Tengyu Xu, et al.
8

The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. The finite-sample convergence rate for the AC and natural actor-critic (NAC) algorithms has been established recently, but under independent and identically distributed (i.i.d.) sampling and single-sample update at each iteration. In contrast, this paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling, with mini-batch data for each iteration, and with actor having general policy class approximation. We show that the overall sample complexity for a mini-batch AC to attain an ϵ-accurate stationary point improves the best known sample complexity of AC by an order of O(1/ϵlog(1/ϵ)). We also show that the overall sample complexity for a mini-batch NAC to attain an ϵ-accurate globally optimal point improves the known sample complexity of natural policy gradient (NPG) by O(1/ϵ/log(1/ϵ)). Our study develops several novel techniques for finite-sample analysis of RL algorithms including handling the bias error due to mini-batch Markovian sampling and exploiting the self variance reduction property to improve the convergence analysis of NAC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2020

Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

As an important type of reinforcement learning algorithms, actor-critic ...
research
12/31/2020

Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup

Asynchronous and parallel implementation of standard reinforcement learn...
research
01/17/2018

An Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients

In this technical report, we consider an approach that combines the PPO ...
research
02/18/2021

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

In this paper, we provide finite-sample convergence guarantees for an of...
research
02/23/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Designing off-policy reinforcement learning algorithms is typically a ve...
research
09/08/2021

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Actor-critic (AC) algorithms have been widely adopted in decentralized m...
research
03/24/2021

Multi-Agent Off-Policy TD Learning: Finite-Time Analysis with Near-Optimal Sample Complexity and Communication Complexity

The finite-time convergence of off-policy TD learning has been comprehen...

Please sign up or login with your details

Forgot password? Click here to reset