Counterfactual Multi-Agent Policy Gradients

05/24/2017
by   Jakob Foerster, et al.
0

Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

READ FULL TEXT
research
10/16/2021

Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning

Policy gradient methods have become popular in multi-agent reinforcement...
research
04/14/2021

Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent Reinforcement Learning

Deep reinforcement learning methods have shown great performance on many...
research
06/12/2020

Learning to Communicate Using Counterfactual Reasoning

This paper introduces a new approach for multi-agent communication learn...
research
09/02/2021

MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

This work considers the problem of learning cooperative policies in mult...
research
12/06/2018

Scene Dynamics: Counterfactual Critic Multi-Agent Training for Scene Graph Generation

Scene graphs -- objects as nodes and visual relationships as edges -- de...
research
09/02/2022

Semi-Centralised Multi-Agent Reinforcement Learning with Policy-Embedded Training

Centralised training (CT) is the basis for many popular multi-agent rein...
research
05/19/2023

Counterfactual Fairness Filter for Fair-Delay Multi-Robot Navigation

Multi-robot navigation is the task of finding trajectories for a team of...

Please sign up or login with your details

Forgot password? Click here to reset