Markov Decision Processes with Long-Term Average Constraints

06/12/2021
by   Mridul Agarwal, et al.
0

We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process. At every interaction, the agent obtains a reward. Further, there are K cost functions. The agent aims to maximize the long-term average reward while simultaneously keeping the K long-term average costs lower than a certain threshold. In this paper, we propose CMDP-PSRL, a posterior sampling based algorithm using which the agent can learn optimal policies to interact with the CMDP. Further, for MDP with S states, A actions, and diameter D, we prove that following CMDP-PSRL algorithm, the agent can bound the regret of not accumulating rewards from optimal policy by O(poly(DSA)√(T)). Further, we show that the violations for any of the K constraints is also bounded by O(poly(DSA)√(T)). To the best of our knowledge, this is the first work which obtains a O(√(T)) regret bounds for ergodic MDPs with long-term average constraints.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2020

Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

In the optimization of dynamical systems, the variables typically have c...
research
07/03/2019

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

We propose a new complexity measure for Markov decision processes (MDP),...
research
01/23/2019

Reinforcement Learning of Markov Decision Processes with Peak Constraints

In this paper, we consider reinforcement learning of Markov Decision Pro...
research
04/27/2023

A Best-of-Both-Worlds Algorithm for Constrained MDPs with Long-Term Constraints

We study online learning in episodic constrained Markov decision process...
research
09/11/2023

Career Path Recommendations for Long-term Income Maximization: A Reinforcement Learning Approach

This study explores the potential of reinforcement learning algorithms t...
research
07/11/2017

Synthesis of Optimal Resilient Control Strategies

Repair mechanisms are important within resilient systems to maintain the...
research
03/11/2020

Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints

In the optimization of dynamic systems, the variables typically have con...

Please sign up or login with your details

Forgot password? Click here to reset