Global Optimality Guarantees For Policy Gradient Methods

06/05/2019
by   Jalaj Bhandari, et al.
0

Policy gradients methods are perhaps the most widely used class of reinforcement learning algorithms. These methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by classical techniques, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to local minima. This work identifies structural properties -- shared by finite MDPs and several classic control problems -- which guarantee that policy gradient objective function has no suboptimal local minima despite being non-convex. When these assumptions are relaxed, our work gives conditions under which any local minimum is near-optimal, where the error bound depends on a notion of the expressive capacity of the policy class.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/15/2018

Global Convergence of Policy Gradient Methods for Linearized Control Problems

Direct policy gradient methods for reinforcement learning and continuous...
research
02/11/2023

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

In this work, we consider the stochastic optimal control problem in cont...
research
05/28/2019

Learning robust control for LQR systems with multiplicative noise via policy gradient

The linear quadratic regulator (LQR) problem has reemerged as an importa...
research
03/21/2022

A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization With Non-isolated Local Minima

Non-convex loss functions arise frequently in modern machine learning, a...
research
10/30/2021

Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Policy gradient methods have been frequently applied to problems in cont...
research
02/02/2019

Non-asymptotic Analysis of Biased Stochastic Approximation Scheme

Stochastic approximation (SA) is a key method used in statistical learni...
research
06/17/2018

Laplacian Smoothing Gradient Descent

We propose a very simple modification of gradient descent and stochastic...

Please sign up or login with your details

Forgot password? Click here to reset