Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

06/06/2022
by   Dongsheng Ding, et al.
0

We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method that updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent. Although the underlying maximization involves a nonconcave objective function and a nonconvex constraint set, under the softmax policy parametrization we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such convergence is independent of the size of the state-action space, i.e., it is dimension-free. Furthermore, for log-linear and general smooth policy parametrizations, we establish sublinear convergence rates up to a function approximation error caused by restricted policy parametrization. We also provide convergence and finite-sample complexity guarantees for two sample-based NPG-PD algorithms. Finally, we use computational experiments to showcase the merits and the effectiveness of our approach.

READ FULL TEXT

page 36

page 37

research
05/22/2022

Policy-based Primal-Dual Methods for Convex Constrained Markov Decision Processes

We study convex Constrained Markov Decision Processes (CMDPs) in which t...
research
06/20/2023

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs

We study the problem of computing an optimal policy of an infinite-horiz...
research
04/11/2022

Towards Painless Policy Optimization for Constrained MDPs

We study policy optimization in an infinite horizon, γ-discounted constr...
research
10/01/2022

Primal-dual regression approach for Markov decision processes with general state and action space

We develop a regression based primal-dual martingale approach for solvin...
research
01/24/2022

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity

We propose the homotopic policy mirror descent (HPMD) method for solving...
research
03/04/2020

Exploration-Exploitation in Constrained MDPs

In many sequential decision-making problems, the goal is to optimize a u...
research
11/03/2022

Geometry and convergence of natural policy gradient methods

We study the convergence of several natural policy gradient (NPG) method...

Please sign up or login with your details

Forgot password? Click here to reset