Neural Policy Gradient Methods: Global Optimality and Rates of Convergence

08/29/2019
by   Lingxiao Wang, et al.
0

Policy gradient methods with actor-critic schemes demonstrate tremendous empirical successes, especially when the actors and critics are parameterized by neural networks. However, it remains less clear whether such "neural" policy gradient methods converge to globally optimal policies and whether they even converge at all. We answer both the questions affirmatively in the overparameterized regime. In detail, we prove that neural natural policy gradient converges to a globally optimal policy at a sublinear rate. Also, we show that neural vanilla policy gradient converges sublinearly to a stationary point. Meanwhile, by relating the suboptimality of the stationary points to the representation power of neural actor and critic classes, we prove the global optimality of all stationary points under mild regularity conditions. Particularly, we show that a key to the global optimality and convergence is the "compatibility" between the actor and critic, which is ensured by sharing neural architectures and random initializations across the actor and critic. To the best of our knowledge, our analysis establishes the first global optimality and convergence guarantees for neural policy gradient methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2020

Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

We study the global convergence and global optimality of actor-critic, o...
research
05/15/2022

Policy Gradient Method For Robust Reinforcement Learning

This paper develops the first policy gradient method with global optimal...
research
01/21/2022

Occupancy Information Ratio: Infinite-Horizon, Information-Directed, Parameterized Policy Search

We develop a new measure of the exploration/exploitation trade-off in in...
research
09/29/2021

Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates

The policy gradient theorem states that the policy should only be update...
research
11/16/2021

Off-Policy Actor-Critic with Emphatic Weightings

A variety of theoretically-sound policy gradient algorithms exist for th...
research
07/13/2019

Stochastic Convergence Results for Regularized Actor-Critic Methods

In this paper, we present a stochastic convergence proof, under suitable...
research
11/04/2021

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

In this paper, we establish the global optimality and convergence rate o...

Please sign up or login with your details

Forgot password? Click here to reset