Interpreting Primal-Dual Algorithms for Constrained MARL

11/29/2022
by   Daniel Tabas, et al.
0

Constrained multiagent reinforcement learning (C-MARL) is gaining importance as MARL algorithms find new applications in real-world systems ranging from energy systems to drone swarms. Most C-MARL algorithms use a primal-dual approach to enforce constraints through a penalty function added to the reward. In this paper, we study the structural effects of this penalty term on the MARL problem. First, we show that the standard practice of using the constraint function as the penalty leads to a weak notion of safety. However, by making simple modifications to the penalty term, we can enforce meaningful probabilistic (chance and conditional value at risk) constraints. Second, we quantify the effect of the penalty term on the value function, uncovering an improved value estimation procedure. We use these insights to propose a constrained multiagent advantage actor critic (C-MAA2C) algorithm. Simulations in a simple constrained multiagent environment affirm that our reinterpretation of the primal-dual method in terms of probabilistic constraints is effective, and that our proposed value estimate accelerates convergence to a safe joint policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2018

Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning

Constrained Markov Decision Process (CMDP) is a natural framework for re...
research
06/13/2023

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

Offline constrained reinforcement learning (RL) aims to learn a policy t...
research
12/22/2020

Dynamic penalty function approach for constraints handling in reinforcement learning

Reinforcement learning (RL) is attracting attentions as an effective way...
research
05/25/2021

Safe Value Functions

The relationship between safety and optimality in control is not well un...
research
10/21/2021

Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes

We consider a discounted cost constrained Markov decision process (CMDP)...
research
05/17/2019

Enforcing constraints for time series prediction in supervised, unsupervised and reinforcement learning

We assume that we are given a time series of data from a dynamical syste...
research
07/27/2023

A Self-Adaptive Penalty Method for Integrating Prior Knowledge Constraints into Neural ODEs

The continuous dynamics of natural systems has been effectively modelled...

Please sign up or login with your details

Forgot password? Click here to reset