Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

10/30/2021
by   Matthew Shunshi Zhang, et al.
0

Policy gradient methods have been frequently applied to problems in control and reinforcement learning with great success, yet existing convergence analysis still relies on non-intuitive, impractical and often opaque conditions. In particular, existing rates are achieved in limited settings, under strict smoothness and bounded conditions. In this work, we establish explicit convergence rates of policy gradient methods without relying on these conditions, instead extending the convergence regime to weakly smooth policy classes with L_2 integrable gradient. We provide intuitive examples to illustrate the insight behind these new conditions. We also characterize the sufficiency conditions for the ergodicity of near-linear MDPs, which represent an important class of problems. Notably, our analysis also shows that fast convergence rates are achievable for both the standard policy gradient and the natural policy gradient algorithms under these assumptions. Lastly we provide conditions and analysis for optimality of the converged policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2020

A Note on the Linear Convergence of Policy Gradient Methods

We revisit the finite time analysis of policy gradient methods in the si...
research
10/04/2022

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

We consider infinite-horizon discounted Markov decision processes and st...
research
11/24/2015

Performance Limits of Stochastic Sub-Gradient Learning, Part I: Single Agent Case

In this work and the supporting Part II, we examine the performance of s...
research
09/29/2021

Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates

The policy gradient theorem states that the policy should only be update...
research
06/19/2019

Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies

Policy gradient (PG) methods are a widely used reinforcement learning me...
research
06/05/2019

Global Optimality Guarantees For Policy Gradient Methods

Policy gradients methods are perhaps the most widely used class of reinf...
research
01/30/2023

A Novel Framework for Policy Mirror Descent with General Parametrization and Linear Convergence

Modern policy optimization methods in applied reinforcement learning, su...

Please sign up or login with your details

Forgot password? Click here to reset