Policy-based Primal-Dual Methods for Convex Constrained Markov Decision Processes

05/22/2022
by   Donghao Ying, et al.
0

We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and the constraints are convex in the state-action visitation distribution. We propose a policy-based primal-dual algorithm that updates the primal variable via policy gradient ascent and updates the dual variable via projected sub-gradient descent. Despite the loss of additivity structure and the nonconvex nature, we establish the global convergence of the proposed algorithm by leveraging a hidden convexity in the problem under the general soft-max parameterization, and prove the 𝒪(T^-1/3) convergence rate in terms of both optimality gap and constraint violation. When the objective is strongly concave in the visitation distribution, we prove an improved convergence rate of 𝒪(T^-1/2). By introducing a pessimistic term to the constraint, we further show that a zero constraint violation can be achieved while preserving the same convergence rate for the optimality gap. This work is the first one in the literature that establishes non-asymptotic convergence guarantees for policy-based primal-dual methods for solving infinite-horizon discounted convex CMDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2022

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

We study sequential decision making problems aimed at maximizing the exp...
research
10/17/2021

A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization

We study entropy-regularized constrained Markov decision processes (CMDP...
research
10/31/2021

Fast Global Convergence of Policy Optimization for Constrained MDPs

We address the issue of safety in reinforcement learning. We pose the pr...
research
06/06/2019

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

We propose a new objective function for finite-horizon episodic Markov d...
research
05/07/2020

A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes

The canonical solution methodology for finite constrained Markov decisio...
research
10/20/2021

Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

The problem of constrained Markov decision process (CMDP) is investigate...
research
04/11/2022

Towards Painless Policy Optimization for Constrained MDPs

We study policy optimization in an infinite horizon, γ-discounted constr...

Please sign up or login with your details

Forgot password? Click here to reset