RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

by   Jian Hu, et al.

In recent years, Multi-Agent Reinforcement Learning (MARL) has revolutionary breakthroughs with its successful applications to multi-agent cooperative scenarios such as computer games and robot swarms. As a popular cooperative MARL algorithm, QMIX does not work well in Super Hard scenarios of Starcraft Multi-Agent Challenge (SMAC). Recent variants of QMIX point out that it may be the monotonicity constraints that limit the performance of QMIX. However, we investigate the implementation trick of these variants and find that they significantly improve the performance of the algorithms. QMIX, with these tricks, achieves extraordinarily high win rates in SMAC and becomes the new SOTA. Furthermore, we propose a policy-based algorithm, RIIT, to study the impact of QMIX's monotonicity constraint. RIIT outperforms other policy-based algorithms, which benefits from the monotonicity constraint. The ablation studies of RIIT demonstrate that Monotonicity constraint can improve the sample efficiency in purely cooperative tasks instead. Finally, we explain why monotonicity constraint works well in cooperative tasks through a theoretical perspective. We open-source the code at <https://github.com/hijkzzz/pymarl2>



page 1

page 2

page 3

page 4


The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Proximal Policy Optimization (PPO) is a popular on-policy reinforcement ...

Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Deep multi-agent reinforcement learning (MARL) holds the promise of auto...

Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL

We study the ability of autonomous vehicles to improve the throughput of...

From Multi-agent to Multi-robot: A Scalable Training and Evaluation Platform for Multi-robot Reinforcement Learning

Multi-agent reinforcement learning (MARL) has been gaining extensive att...

Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

Recent works have applied the Proximal Policy Optimization (PPO) to the ...

Learning to Simulate Self-Driven Particles System with Coordinated Policy Optimization

Self-Driven Particles (SDP) describe a category of multi-agent systems c...

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

The two-sided markets such as ride-sharing companies often involve a gro...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.