Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

06/15/2022
by   Wei Fu, et al.
0

Many advances in cooperative multi-agent reinforcement learning (MARL) are based on two common design principles: value decomposition and parameter sharing. A typical MARL algorithm of this fashion decomposes a centralized Q-function into local Q-networks with parameters shared across agents. Such an algorithmic paradigm enables centralized training and decentralized execution (CTDE) and leads to efficient learning in practice. Despite all the advantages, we revisit these two principles and show that in certain scenarios, e.g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes. In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases, which partially supports some recent empirical observations that PG can be effective in many MARL testbeds. Inspired by our theoretical analysis, we present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors and empirically validate our findings on a variety of domains, ranging from the simplified matrix and grid-world games to complex benchmarks such as StarCraft Multi-Agent Challenge and Google Research Football. We hope our insights could benefit the community towards developing more general and more powerful MARL algorithms. Check our project website at https://sites.google.com/view/revisiting-marl.

READ FULL TEXT

page 8

page 9

research
07/29/2021

Survey of Recent Multi-Agent Reinforcement Learning Algorithms Utilizing Centralized Training

Much work has been dedicated to the exploration of Multi-Agent Reinforce...
research
06/16/2017

Value-Decomposition Networks For Cooperative Multi-Agent Learning

We study the problem of cooperative multi-agent reinforcement learning w...
research
09/22/2021

Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning

Cooperative multi-agent reinforcement learning (MARL) faces significant ...
research
01/03/2022

A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning

Centralized Training for Decentralized Execution, where training is done...
research
03/28/2022

UNMAS: Multi-Agent Reinforcement Learning for Unshaped Cooperative Scenarios

Multi-agent reinforcement learning methods such as VDN, QMIX, and QTRAN ...
research
05/31/2020

Towards Understanding Linear Value Decomposition in Cooperative Multi-Agent Q-Learning

Value decomposition is a popular and promising approach to scaling up mu...
research
08/07/2022

Maximum Correntropy Value Decomposition for Multi-agent Deep Reinforcemen Learning

We explore value decomposition solutions for multi-agent deep reinforcem...

Please sign up or login with your details

Forgot password? Click here to reset