Block Policy Mirror Descent

01/15/2022
by   Guanghui Lan, et al.
0

In this paper, we present a new class of policy gradient (PG) methods, namely the block policy mirror descent (BPMD) methods for solving a class of regularized reinforcement learning (RL) problems with (strongly) convex regularizers. Compared to the traditional PG methods with batch update rule, which visit and update the policy for every state, BPMD methods have cheap per-iteration computation via a partial update rule that performs the policy update on a sampled state. Despite the nonconvex nature of the problem and a partial update rule, BPMD methods achieve fast linear convergence to the global optimality. We further extend BPMD methods to the stochastic setting, by utilizing stochastic first-order information constructed from samples. We establish (1/ϵ) (resp. (1/ϵ^2)) sample complexity for the strongly convex (resp. non-strongly convex) regularizers, with different procedures for constructing the stochastic first-order information, where ϵ denotes the target accuracy. To the best of our knowledge, this is the first time that block coordinate descent methods have been developed and analyzed for policy optimization in reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2021

Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes

We present new policy mirror descent (PMD) methods for solving reinforce...
research
02/12/2020

Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning

We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcemen...
research
01/11/2021

Independent Policy Gradient Methods for Competitive Reinforcement Learning

We obtain global, non-asymptotic convergence guarantees for independent ...
research
05/25/2022

Stochastic Second-Order Methods Provably Beat SGD For Gradient-Dominated Functions

We study the performance of Stochastic Cubic Regularized Newton (SCRN) o...
research
07/10/2020

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

This paper analyzes a two-timescale stochastic algorithm for a class of ...
research
11/30/2022

Policy Optimization over General State and Action Spaces

Reinforcement learning (RL) problems over general state and action space...
research
05/25/2023

Sample Efficient Reinforcement Learning in Mixed Systems through Augmented Samples and Its Applications to Queueing Networks

This paper considers a class of reinforcement learning problems, which i...

Please sign up or login with your details

Forgot password? Click here to reset