Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

08/02/2020
by   Zuyue Fu, et al.
0

We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms. While most existing works on actor-critic employ bi-level or two-timescale updates, we focus on the more practical single-timescale setting, where the actor and critic are updated simultaneously. Specifically, in each iteration, the critic update is obtained by applying the Bellman evaluation operator only once while the actor is updated in the policy gradient direction computed using the critic. Moreover, we consider two function approximation settings where both the actor and critic are represented by linear or deep neural networks. For both cases, we prove that the actor sequence converges to a globally optimal policy at a sublinear O(K^-1/2) rate, where K is the number of iterations. To the best of our knowledge, we establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time. Moreover, under the broader scope of policy optimization with nonlinear function approximation, we prove that actor-critic with deep neural network finds the globally optimal policy at a sublinear rate for the first time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/29/2019

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence

Policy gradient methods with actor-critic schemes demonstrate tremendous...
research
08/19/2021

Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning

Actor-critic algorithms are widely used in reinforcement learning, but a...
research
12/27/2021

Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic

Actor-critic (AC) algorithms, empowered by neural networks, have had sig...
research
12/28/2020

Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy

While deep reinforcement learning has achieved tremendous successes in v...
research
12/02/2020

Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER

We prove under commonly used assumptions the convergence of actor-critic...
research
06/25/2019

Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

Proximal policy optimization and trust region policy optimization (PPO a...
research
06/10/2023

A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence

Abstract – Deep Actor-Critic algorithms, which combine Actor-Critic with...

Please sign up or login with your details

Forgot password? Click here to reset