Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

02/22/2023
by   Emmeran Johnson, et al.
0

The classical algorithms used in tabular reinforcement learning (Value Iteration and Policy Iteration) have been shown to converge linearly with a rate given by the discount factor γ of a discounted Markov Decision Process. Recently, there has been an increased interest in the study of gradient based methods. In this work, we show that the dimension-free linear γ-rate of classical reinforcement learning algorithms can be achieved by a general family of unregularised Policy Mirror Descent (PMD) algorithms under an adaptive step-size. We also provide a matching worst-case lower-bound that demonstrates that the γ-rate is optimal for PMD methods. Our work offers a novel perspective on the convergence of PMD. We avoid the use of the performance difference lemma beyond establishing the monotonic improvement of the iterates, which leads to a simple analysis that may be of independent interest. We also extend our analysis to the inexact setting and establish the first dimension-free ε-optimal sample complexity for unregularised PMD under a generative model, improving upon the best-known result.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2021

On the Linear convergence of Natural Policy Gradient Algorithm

Markov Decision Processes are classically solved using Value Iteration a...
research
05/26/2023

Accelerating Value Iteration with Anchoring

Value Iteration (VI) is foundational to the theory and practice of moder...
research
02/15/2020

Loop estimator for discounted values in Markov reward processes

At the working heart of policy iteration algorithms commonly used and st...
research
03/24/2021

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

Policy optimization methods are popular reinforcement learning algorithm...
research
02/16/2021

Improper Learning with Gradient-based Policy Optimization

We consider an improper reinforcement learning setting where the learner...
research
03/27/2020

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

We present a distributional approach to theoretical analyses of reinforc...
research
01/25/2019

A Laplacian Approach to ℓ_1-Norm Minimization

We propose a novel differentiable reformulation of the linearly-constrai...

Please sign up or login with your details

Forgot password? Click here to reset