Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

02/10/2020
by   Yaodong Yang, et al.
0

In many real-world settings, a team of cooperative agents must learn to coordinate their behavior with private observations and communication constraints. Deep multiagent reinforcement learning algorithms (Deep-MARL) have shown superior performance in these realistic and difficult problems but still suffer from challenges. One branch is the multiagent value decomposition, which decomposes the global shared multiagent Q-value Q_tot into individual Q-values Q^i to guide individuals' behaviors. However, previous work achieves the value decomposition heuristically without valid theoretical groundings, where VDN supposes an additive formation and QMIX adopts an implicit inexplicable mixing method. In this paper, for the first time, we theoretically derive a linear decomposing formation from Q_tot to each Q^i. Based on this theoretical finding, we introduce the multi-head attention mechanism to approximate each term in the decomposing formula with theoretical explanations. Experiments show that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmarks across different scenarios, and attention analysis is also investigated with sights.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2020

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Recently, deep multiagent reinforcement learning (MARL) has become a hig...
research
04/26/2023

NA^2Q: Neural Attention Additive Model for Interpretable Multi-Agent Q-Learning

Value decomposition is widely used in cooperative multi-agent reinforcem...
research
04/30/2022

An attention model for the formation of collectives in real-world domains

We consider the problem of forming collectives of agents for real-world ...
research
05/31/2020

Towards Understanding Linear Value Decomposition in Cooperative Multi-Agent Q-Learning

Value decomposition is a popular and promising approach to scaling up mu...
research
06/09/2023

Explaining Reinforcement Learning with Shapley Values

For reinforcement learning systems to be widely adopted, their users mus...
research
05/27/2021

Pattern Transfer Learning for Reinforcement Learning in Order Dispatching

Order dispatch is one of the central problems to ride-sharing platforms....
research
05/13/2021

SIDE: I Infer the State I Want to Learn

As one of the solutions to the Dec-POMDP problem, the value decompositio...

Please sign up or login with your details

Forgot password? Click here to reset