ReMIX: Regret Minimization for Monotonic Value Function Factorization in Multiagent Reinforcement Learning

02/11/2023
by   Yongsheng Mei, et al.
0

Value function factorization methods have become a dominant approach for cooperative multiagent reinforcement learning under a centralized training and decentralized execution paradigm. By factorizing the optimal joint action-value function using a monotonic mixing function of agents' utilities, these algorithms ensure the consistency between joint and local action selections for decentralized decision-making. Nevertheless, the use of monotonic mixing functions also induces representational limitations. Finding the optimal projection of an unrestricted mixing function onto monotonic function classes is still an open problem. To this end, we propose ReMIX, formulating this optimal projection problem for value function factorization as a regret minimization over the projection weights of different state-action values. Such an optimization problem can be relaxed and solved using the Lagrangian multiplier method to obtain the close-form optimal projection weights. By minimizing the resulting policy regret, we can narrow the gap between the optimal and the restricted monotonic mixing functions, thus obtaining an improved monotonic value function factorization. Our experimental results on Predator-Prey and StarCraft Multiagent Challenge environments demonstrate the effectiveness of our method, indicating the better capabilities of handling environments with non-monotonic value functions.

READ FULL TEXT
research
04/05/2021

NQMIX: Non-monotonic Value Function Factorization for Deep Multi-Agent Reinforcement Learning

Multi-agent value-based approaches recently make great progress, especia...
research
01/04/2022

Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients

Value function factorization via centralized training and decentralized ...
research
06/18/2020

Weighted QMIX: Expanding Monotonic Value Function Factorisation

QMIX is a popular Q-learning algorithm for cooperative MARL in the centr...
research
05/14/2019

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning

We explore value-based solutions for multi-agent reinforcement learning ...
research
09/09/2020

QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the s...
research
02/24/2022

Threading the Needle of On and Off-Manifold Value Functions for Shapley Explanations

A popular explainable AI (XAI) approach to quantify feature importance o...
research
05/20/2019

Issues concerning realizability of Blackwell optimal policies in reinforcement learning

N-discount optimality was introduced as a hierarchical form of policy- a...

Please sign up or login with your details

Forgot password? Click here to reset