RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning
In recent years, Multi-Agent Reinforcement Learning (MARL) has revolutionary breakthroughs with its successful applications to multi-agent cooperative scenarios such as computer games and robot swarms. As a popular cooperative MARL algorithm, QMIX does not work well in Super Hard scenarios of Starcraft Multi-Agent Challenge (SMAC). Recent variants of QMIX point out that it may be the monotonicity constraints that limit the performance of QMIX. However, we investigate the implementation trick of these variants and find that they significantly improve the performance of the algorithms. QMIX, with these tricks, achieves extraordinarily high win rates in SMAC and becomes the new SOTA. Furthermore, we propose a policy-based algorithm, RIIT, to study the impact of QMIX's monotonicity constraint. RIIT outperforms other policy-based algorithms, which benefits from the monotonicity constraint. The ablation studies of RIIT demonstrate that Monotonicity constraint can improve the sample efficiency in purely cooperative tasks instead. Finally, we explain why monotonicity constraint works well in cooperative tasks through a theoretical perspective. We open-source the code at <https://github.com/hijkzzz/pymarl2>
READ FULL TEXT