Global Optimality Guarantees For Policy Gradient Methods

by   Jalaj Bhandari, et al.
Columbia University

Policy gradients methods are perhaps the most widely used class of reinforcement learning algorithms. These methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by classical techniques, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to local minima. This work identifies structural properties -- shared by finite MDPs and several classic control problems -- which guarantee that policy gradient objective function has no suboptimal local minima despite being non-convex. When these assumptions are relaxed, our work gives conditions under which any local minimum is near-optimal, where the error bound depends on a notion of the expressive capacity of the policy class.


page 1

page 2

page 3

page 4


Global Convergence of Policy Gradient Methods for Linearized Control Problems

Direct policy gradient methods for reinforcement learning and continuous...

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

In this work, we consider the stochastic optimal control problem in cont...

Learning robust control for LQR systems with multiplicative noise via policy gradient

The linear quadratic regulator (LQR) problem has reemerged as an importa...

Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Policy gradient methods have been frequently applied to problems in cont...

Non-asymptotic Analysis of Biased Stochastic Approximation Scheme

Stochastic approximation (SA) is a key method used in statistical learni...

Laplacian Smoothing Gradient Descent

We propose a very simple modification of gradient descent and stochastic...

Please sign up or login with your details

Forgot password? Click here to reset