MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning

06/22/2021
by   Zhiwei Xu, et al.
0

In the real world, many tasks require multiple agents to cooperate with each other under the condition of local observations. To solve such problems, many multi-agent reinforcement learning methods based on Centralized Training with Decentralized Execution have been proposed. One representative class of work is value decomposition, which decomposes the global joint Q-value Q_jt into individual Q-values Q_a to guide individuals' behaviors, e.g. VDN (Value-Decomposition Networks) and QMIX. However, these baselines often ignore the randomness in the situation. We propose MMD-MIX, a method that combines distributional reinforcement learning and value decomposition to alleviate the above weaknesses. Besides, to improve data sampling efficiency, we were inspired by REM (Random Ensemble Mixture) which is a robust RL algorithm to explicitly introduce randomness into the MMD-MIX. The experiments demonstrate that MMD-MIX outperforms prior baselines in the StarCraft Multi-Agent Challenge (SMAC) environment.

READ FULL TEXT
research
09/09/2020

QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the s...
research
02/04/2023

Dual Self-Awareness Value Decomposition Framework without Individual Global Max for Cooperative Multi-Agent Reinforcement Learning

Value decomposition methods have gradually become popular in the coopera...
research
09/20/2022

Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning

In cooperative multi-agent reinforcement learning, centralized training ...
research
07/08/2022

Interaction Pattern Disentangling for Multi-Agent Reinforcement Learning

Deep cooperative multi-agent reinforcement learning has demonstrated its...
research
02/16/2021

DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-Learning

In fully cooperative multi-agent reinforcement learning (MARL) settings,...
research
02/16/2021

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

Current value-based multi-agent reinforcement learning methods optimize ...
research
05/13/2021

SIDE: I Infer the State I Want to Learn

As one of the solutions to the Dec-POMDP problem, the value decompositio...

Please sign up or login with your details

Forgot password? Click here to reset