Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation

02/28/2022
by   Jing Dong, et al.
0

We study the convergence of the actor-critic algorithm with nonlinear function approximation under a nonconvex-nonconcave primal-dual formulation. Stochastic gradient descent ascent is applied with an adaptive proximal term for robust learning rates. We show the first efficient convergence result with primal-dual actor-critic with a convergence rate of 𝒪(√(ln(N d G^2 )/N)) under Markovian sampling, where G is the element-wise maximum of the gradient, N is the number of iterations, and d is the dimension of the gradient. Our result is presented with only the Polyak-Łojasiewicz condition for the dual variables, which is easy to verify and applicable to a wide range of reinforcement learning (RL) scenarios. The algorithm and analysis are general enough to be applied to other RL settings, like multi-agent RL. Empirical results on OpenAI Gym continuous control tasks corroborate our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2021

Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation

Actor-critic methods integrating target networks have exhibited a stupen...
research
11/11/2019

Provably Convergent Off-Policy Actor-Critic with Function Approximation

We present the first provably convergent off-policy actor-critic algorit...
research
10/21/2021

Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes

We consider a discounted cost constrained Markov decision process (CMDP)...
research
12/02/2020

Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER

We prove under commonly used assumptions the convergence of actor-critic...
research
01/28/2023

Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

Many existing reinforcement learning (RL) methods employ stochastic grad...
research
09/29/2021

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

We study a novel two-time-scale stochastic gradient method for solving o...
research
12/31/2020

Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup

Asynchronous and parallel implementation of standard reinforcement learn...

Please sign up or login with your details

Forgot password? Click here to reset