Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes

10/21/2021
by   Sihan Zeng, et al.
0

We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities. To solve this constrained optimization program, we study an online actor-critic variant of a classic primal-dual method where the gradients of both the primal and dual functions are estimated using samples from a single trajectory generated by the underlying time-varying Markov processes. This online primal-dual natural actor-critic algorithm maintains and iteratively updates three variables: a dual variable (or Lagrangian multiplier), a primal variable (or actor), and a critic variable used to estimate the gradients of both primal and dual variables. These variables are updated simultaneously but on different time scales (using different step sizes) and they are all intertwined with each other. Our main contribution is to derive a finite-time analysis for the convergence of this algorithm to the global optimum of a CMDP problem. Specifically, we show that with a proper choice of step sizes the optimality gap and constraint violation converge to zero in expectation at a rate 𝒪(1/K^1/6), where K is the number of iterations. To our knowledge, this paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem. We also validate the effectiveness of this algorithm through numerical simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2021

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

We study a novel two-time-scale stochastic gradient method for solving o...
research
06/30/2011

Dual Modelling of Permutation and Injection Problems

When writing a constraint program, we have to choose which variables sho...
research
12/29/2017

Boosting the Actor with Dual Critic

This paper proposes a new actor-critic-style algorithm called Dual Actor...
research
02/28/2022

Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation

We study the convergence of the actor-critic algorithm with nonlinear fu...
research
03/02/2023

Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

Sequential incentive marketing is an important approach for online busin...
research
11/29/2022

Interpreting Primal-Dual Algorithms for Constrained MARL

Constrained multiagent reinforcement learning (C-MARL) is gaining import...
research
08/21/2022

Robust Tests in Online Decision-Making

Bandit algorithms are widely used in sequential decision problems to max...

Please sign up or login with your details

Forgot password? Click here to reset