MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

05/29/2021
by   Junyu Zhang, et al.
0

We posit a new mechanism for cooperation in multi-agent reinforcement learning (MARL) based upon any nonlinear function of the team's long-term state-action occupancy measure, i.e., a general utility. This subsumes the cumulative return but also allows one to incorporate risk-sensitivity, exploration, and priors. Reward Actor-Critic (DSAC) in which agents alternate between policy evaluation (critic), weighted averaging with neighbors (information mixing), and local gradient updates for their policy parameters (actor). DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i.e., the "shadow reward". DSAC converges to ϵ-stationarity in 𝒪(1/ϵ^2.5) (Theorem <ref>) or faster 𝒪(1/ϵ^2) (Corollary <ref>) steps with high probability, depending on the amount of communications. We further establish the non-existence of spurious stationary points for this problem, that is, DSAC finds the globally optimal policy (Corollary <ref>). Experiments demonstrate the merits of goals beyond the cumulative return in cooperative MARL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2020

Potential Field Guided Actor-Critic Reinforcement Learning

In this paper, we consider the problem of actor-critic reinforcement lea...
research
03/21/2019

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

In this paper, we propose a distributed off-policy actor critic method t...
research
07/09/2018

Partial Policy-based Reinforcement Learning for Anatomical Landmark Localization in 3D Medical Images

Deploying the idea of long-term cumulative return, reinforcement learnin...
research
10/15/2020

Cooperative-Competitive Reinforcement Learning with History-Dependent Rewards

Consider a typical organization whose worker agents seek to collectively...
research
05/27/2023

Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities

We investigate safe multi-agent reinforcement learning, where agents see...
research
04/10/2019

Actor-Critic Instance Segmentation

Most approaches to visual scene analysis have emphasised parallel proces...
research
01/07/2022

Deep Learnable Strategy Templates for Multi-Issue Bilateral Negotiation

We study how to exploit the notion of strategy templates to learn strate...

Please sign up or login with your details

Forgot password? Click here to reset