DeepAI AI Chat
Log In Sign Up

On the Linear convergence of Natural Policy Gradient Algorithm

by   Sajad Khodadadian, et al.

Markov Decision Processes are classically solved using Value Iteration and Policy Iteration algorithms. Recent interest in Reinforcement Learning has motivated the study of methods inspired by optimization, such as gradient ascent. Among these, a popular algorithm is the Natural Policy Gradient, which is a mirror descent variant for MDPs. This algorithm forms the basis of several popular Reinforcement Learning algorithms such as Natural actor-critic, TRPO, PPO, etc, and so is being studied with growing interest. It has been shown that Natural Policy Gradient with constant step size converges with a sublinear rate of O(1/k) to the global optimal. In this paper, we present improved finite time convergence bounds, and show that this algorithm has geometric (also known as linear) asymptotic convergence rate. We further improve this convergence result by introducing a variant of Natural Policy Gradient with adaptive step sizes. Finally, we compare different variants of policy gradient methods experimentally.


page 1

page 2

page 3

page 4


Stochastic Variance-Reduced Policy Gradient

In this paper, we propose a novel reinforcement- learning algorithm cons...

Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning

We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcemen...

Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

The classical algorithms used in tabular reinforcement learning (Value I...

Bag of Tricks for Natural Policy Gradient Reinforcement Learning

Natural policy gradient methods are popular reinforcement learning metho...

Policy Gradient Method For Robust Reinforcement Learning

This paper develops the first policy gradient method with global optimal...

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

We study policy optimization for Markov decision processes (MDPs) with m...