QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

09/09/2020
by   Jian Hu, et al.
7

In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the setting of Centralized Training with Decentralized Execution (CTDE), agents observe and interact with their environment locally and independently. With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns. Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness. Our proposed model QR-MIX introduces quantile regression, modeling joint state-action values as a distribution, combining QMIX with Implicit Quantile Network (IQN). However, the monotonicity in QMIX limits the expression of joint state-action value distribution and may lead to incorrect estimation results in non-monotonic cases. Therefore, we proposed a flexible loss function to approximate the monotonicity found in QMIX. Our model is not only more tolerant of the randomness of returns, but also more tolerant of the randomness of monotonic constraints. The experimental results demonstrate that QR-MIX outperforms the previous state-of-the-art method QMIX in the StarCraft Multi-Agent Challenge (SMAC) environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2022

DQMIX: A Distributional Perspective on Multi-Agent Reinforcement Learning

In cooperative multi-agent tasks, a team of agents jointly interact with...
research
06/22/2021

MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning

In the real world, many tasks require multiple agents to cooperate with ...
research
12/23/2021

Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) enables us to create adaptive ...
research
03/19/2020

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

In many real-world settings, a team of agents must coordinate its behavi...
research
02/16/2021

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

Current value-based multi-agent reinforcement learning methods optimize ...
research
02/11/2023

ReMIX: Regret Minimization for Monotonic Value Function Factorization in Multiagent Reinforcement Learning

Value function factorization methods have become a dominant approach for...
research
06/18/2020

Weighted QMIX: Expanding Monotonic Value Function Factorisation

QMIX is a popular Q-learning algorithm for cooperative MARL in the centr...

Please sign up or login with your details

Forgot password? Click here to reset