A Best-of-Both-Worlds Algorithm for Constrained MDPs with Long-Term Constraints

04/27/2023
by   Jacopo Germano, et al.
0

We study online learning in episodic constrained Markov decision processes (CMDPs), where the goal of the learner is to collect as much reward as possible over the episodes, while guaranteeing that some long-term constraints are satisfied during the learning process. Rewards and constraints can be selected either stochastically or adversarially, and the transition function is not known to the learner. While online learning in classical unconstrained MDPs has received considerable attention over the last years, the setting of CMDPs is still largely unexplored. This is surprising, since in real-world applications, such as, e.g., autonomous driving, automated bidding, and recommender systems, there are usually additional constraints and specifications that an agent has to obey during the learning process. In this paper, we provide the first best-of-both-worlds algorithm for CMDPs with long-term constraints. Our algorithm is capable of handling settings in which rewards and constraints are selected either stochastically or adversarially, without requiring any knowledge of the underling process. Moreover, our algorithm matches state-of-the-art regret and constraint violation bounds for settings in which constraints are selected stochastically, while it is the first to provide guarantees in the case in which they are chosen adversarially.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2021

Markov Decision Processes with Long-Term Average Constraints

We consider the problem of constrained Markov Decision Process (CMDP) wh...
research
09/15/2022

A Unifying Framework for Online Optimization with Long-Term Constraints

We study online learning problems in which a decision maker has to take ...
research
06/10/2020

Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

In the optimization of dynamical systems, the variables typically have c...
research
10/30/2022

Online Convex Optimization with Long Term Constraints for Predictable Sequences

In this paper, we investigate the framework of Online Convex Optimizatio...
research
03/11/2020

Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints

In the optimization of dynamic systems, the variables typically have con...
research
07/06/2020

Online Learning of Facility Locations

In this paper, we provide a rigorous theoretical investigation of an onl...
research
11/13/2019

Asynchronous Distributed Learning from Constraints

In this paper, the extension of the framework of Learning from Constrain...

Please sign up or login with your details

Forgot password? Click here to reset