A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes

05/07/2020
by   Sami Khairy, et al.
14

The canonical solution methodology for finite constrained Markov decision processes (CMDPs), where the objective is to maximize the expected infinite-horizon discounted rewards subject to the expected infinite-horizon discounted costs constraints, is based on convex linear programming. In this brief, we first prove that the optimization objective in the dual linear program of a finite CMDP is a piece-wise linear convex function (PWLC) with respect to the Lagrange penalty multipliers. Next, we propose a novel two-level Gradient-Aware Search (GAS) algorithm which exploits the PWLC structure to find the optimal state-value function and Lagrange penalty multipliers of a finite CMDP. The proposed algorithm is applied in two stochastic control problems with constraints: robot navigation in a grid world and solar-powered unmanned aerial vehicle (UAV)-based wireless network management. We empirically compare the convergence performance of the proposed GAS algorithm with binary search (BS), Lagrangian primal-dual optimization (PDO), and Linear Programming (LP). Compared with benchmark algorithms, it is shown that the proposed GAS algorithm converges to the optimal solution faster, does not require hyper-parameter tuning, and is not sensitive to initialization of the Lagrange penalty multiplier.

READ FULL TEXT

page 1

page 7

research
05/22/2022

Policy-based Primal-Dual Methods for Convex Constrained Markov Decision Processes

We study convex Constrained Markov Decision Processes (CMDPs) in which t...
research
06/28/2022

Linear programming-based solution methods for constrained POMDPs

Constrained partially observable Markov decision processes (CPOMDPs) hav...
research
03/15/2019

On Sample Complexity of Projection-Free Primal-Dual Methods for Learning Mixture Policies in Markov Decision Processes

We study the problem of learning policy of an infinite-horizon, discount...
research
06/12/2023

Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes

Constrained Markov Decision Processes (CMDPs) are one of the common ways...
research
10/17/2021

A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization

We study entropy-regularized constrained Markov decision processes (CMDP...
research
09/10/2023

Convex Q Learning in a Stochastic Environment: Extended Version

The paper introduces the first formulation of convex Q-learning for Mark...
research
12/30/2021

A General Traffic Shaping Protocol in E-Commerce

To approach different business objectives, online traffic shaping algori...

Please sign up or login with your details

Forgot password? Click here to reset