Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes

01/30/2021
by   Guanghui Lan, et al.
0

We present new policy mirror descent (PMD) methods for solving reinforcement learning (RL) problems with either strongly convex or general convex regularizers. By exploring the structural properties of these overall seemly highly nonconvex problems we show that the PMD methods exhibit fast linear rate of convergence to the global optimality. We develop stochastic counterparts of these methods, and establish an O(1/ϵ) (resp., O(1/ϵ^2)) sampling complexity for solving these RL problems with strongly (resp., general) convex regularizers using different sampling schemes, where ϵ denote the target accuracy. We further show that the complexity for computing the gradients of these regularizers, if necessary, can be bounded by O{(log_γϵ) [(1-γ)L/μ]^1/2log (1/ϵ)} (resp., O{(log_γϵ ) (L/ϵ)^1/2})for problems with strongly (resp., general) convex regularizers. Here γ denotes the discounting factor. To the best of our knowledge, these complexity bounds, along with our algorithmic developments, appear to be new in both optimization and RL literature. The introduction of these convex regularizers also greatly expands the flexibility and applicability of RL models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/15/2022

Block Policy Mirror Descent

In this paper, we present a new class of policy gradient (PG) methods, n...
research
11/30/2022

Policy Optimization over General State and Action Spaces

Reinforcement learning (RL) problems over general state and action space...
research
02/06/2022

Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning

In reinforcement learning (RL), offline learning decoupled learning from...
research
05/24/2021

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Policy optimization, which learns the policy of interest by maximizing t...
research
06/21/2022

A Single-Timescale Analysis For Stochastic Approximation With Multiple Coupled Sequences

Stochastic approximation (SA) with multiple coupled sequences has found ...
research
10/18/2019

On Connections between Constrained Optimization and Reinforcement Learning

Dynamic Programming (DP) provides standard algorithms to solve Markov De...
research
08/17/2022

Sampling Through the Lens of Sequential Decision Making

Sampling is ubiquitous in machine learning methodologies. Due to the gro...

Please sign up or login with your details

Forgot password? Click here to reset