PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning

06/22/2022
by   Hanhan Zhou, et al.
0

Multi-agent reinforcement learning (MARL) has witnessed significant progress with the development of value function factorization methods. It allows optimizing a joint action-value function through the maximization of factorized per-agent utilities due to monotonicity. In this paper, we show that in partially observable MARL problems, an agent's ordering over its own actions could impose concurrent constraints (across different states) on the representable function class, causing significant estimation error during training. We tackle this limitation and propose PAC, a new framework leveraging Assistive information generated from Counterfactual Predictions of optimal joint action selection, which enable explicit assistance to value function factorization through a novel counterfactual loss. A variational inference-based information encoding method is developed to collect and encode the counterfactual predictions from an estimated baseline. To enable decentralized execution, we also derive factorized per-agent policies inspired by a maximum-entropy MARL framework. We evaluate the proposed PAC on multi-agent predator-prey and a set of StarCraft II micromanagement tasks. Empirical results demonstrate improved results of PAC over state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms on all benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2019

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning

We explore value-based solutions for multi-agent reinforcement learning ...
research
04/03/2023

Effective and Stable Role-Based Multi-Agent Collaboration by Structural Information Principles

Role-based learning is a promising approach to improving the performance...
research
06/04/2023

A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning

In fully cooperative multi-agent reinforcement learning (MARL) settings,...
research
03/30/2018

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

In many real-world settings, a team of agents must coordinate their beha...
research
04/03/2019

Robust Multi-agent Counterfactual Prediction

We consider the problem of using logged data to make predictions about w...
research
01/27/2023

Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning

Large-scale AI systems that combine search and learning have reached sup...
research
09/08/2023

Leveraging World Model Disentanglement in Value-Based Multi-Agent Reinforcement Learning

In this paper, we propose a novel model-based multi-agent reinforcement ...

Please sign up or login with your details

Forgot password? Click here to reset