Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

by   Tianjiao Li, et al.

The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its utilities/costs. A new primal-dual approach is proposed with a novel integration of three ingredients: entropy regularized policy optimizer, dual variable regularizer, and Nesterov's accelerated gradient descent dual optimizer, all of which are critical to achieve a faster convergence. The finite-time error bound of the proposed approach is characterized. Despite the challenge of the nonconcave objective subject to nonconcave constraints, the proposed approach is shown to converge to the global optimum with a complexity of 𝒊Ėƒ(1/Ïĩ) in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approach by a factor of 𝒊(1/Ïĩ) <cit.>. This is the first demonstration that nonconcave CMDP problems can attain the complexity lower bound of 𝒊(1/Ïĩ) for convex optimization subject to convex constraints. Our primal-dual approach and non-asymptotic analysis are agnostic to the RL optimizer used, and thus are more flexible for practical applications. More generally, our approach also serves as the first algorithm that provably accelerates constrained nonconvex optimization with zero duality gap by exploiting the geometries such as the gradient dominance condition, for which the existing acceleration methods for constrained convex optimization are not applicable.


page 1

page 2

page 3

page 4

∙ 11/11/2020

A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis

Safe reinforcement learning (SRL) problems are typically modeled as cons...
∙ 05/22/2022

Policy-based Primal-Dual Methods for Convex Constrained Markov Decision Processes

We study convex Constrained Markov Decision Processes (CMDPs) in which t...
∙ 06/12/2022

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

We consider the problem of constrained Markov decision process (CMDP) in...
∙ 10/17/2021

A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization

We study entropy-regularized constrained Markov decision processes (CMDP...
∙ 02/02/2023

A general Markov decision process formalism for action-state entropy-regularized reward maximization

Previous work has separately addressed different forms of action, state ...
∙ 01/29/2022

Learning Stochastic Graph Neural Networks with Constrained Variance

Stochastic graph neural networks (SGNNs) are information processing arch...
∙ 11/01/2019

Parallel Randomized Algorithm for Chance Constrained Program

Chance constrained program is computationally intractable due to the exi...

Please sign up or login with your details

Forgot password? Click here to reset