Log In Sign Up

Block Policy Mirror Descent

by   Guanghui Lan, et al.

In this paper, we present a new class of policy gradient (PG) methods, namely the block policy mirror descent (BPMD) methods for solving a class of regularized reinforcement learning (RL) problems with (strongly) convex regularizers. Compared to the traditional PG methods with batch update rule, which visit and update the policy for every state, BPMD methods have cheap per-iteration computation via a partial update rule that performs the policy update on a sampled state. Despite the nonconvex nature of the problem and a partial update rule, BPMD methods achieve fast linear convergence to the global optimality. We further extend BPMD methods to the stochastic setting, by utilizing stochastic first-order information constructed from samples. We establish (1/ϵ) (resp. (1/ϵ^2)) sample complexity for the strongly convex (resp. non-strongly convex) regularizers, with different procedures for constructing the stochastic first-order information, where ϵ denotes the target accuracy. To the best of our knowledge, this is the first time that block coordinate descent methods have been developed and analyzed for policy optimization in reinforcement learning.


page 1

page 2

page 3

page 4


Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning

We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcemen...

Independent Policy Gradient Methods for Competitive Reinforcement Learning

We obtain global, non-asymptotic convergence guarantees for independent ...

Stochastic Second-Order Methods Provably Beat SGD For Gradient-Dominated Functions

We study the performance of Stochastic Cubic Regularized Newton (SCRN) o...

Sample Complexity of Policy-Based Methods under Off-Policy Sampling and Linear Function Approximation

In this work, we study policy-based methods for solving the reinforcemen...

Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks

We present a stochastic first-order optimization algorithm, named BCSC, ...

Sparse Q-learning with Mirror Descent

This paper explores a new framework for reinforcement learning based on ...