The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

03/02/2021
by   Chao Yu, et al.
7

Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a multi-agent PPO variant which adopts a centralized value function. Using a 1-GPU desktop, we show that MAPPO achieves performance comparable to the state-of-the-art in three popular multi-agent testbeds: the Particle World environments, Starcraft II Micromanagement Tasks, and the Hanabi Challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves better or comparable sample complexity as well as substantially faster running time. Finally, we present 5 factors most influential to MAPPO's practical performance with ablation studies.

READ FULL TEXT

page 7

page 14

page 19

page 20

page 21

page 22

page 23

page 24

research
02/06/2021

RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

In recent years, Multi-Agent Reinforcement Learning (MARL) has revolutio...
research
05/08/2023

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

Policy optimization methods with function approximation are widely used ...
research
02/10/2021

Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning

Value-based methods of multi-agent reinforcement learning (MARL), especi...
research
02/23/2023

Concept Learning for Interpretable Multi-Agent Reinforcement Learning

Multi-agent robotic systems are increasingly operating in real-world env...
research
08/08/2023

Communication-Efficient Cooperative Multi-Agent PPO via Regulated Segment Mixture in Internet of Vehicles

Multi-Agent Reinforcement Learning (MARL) has become a classic paradigm ...
research
10/13/2022

Multi-agent Dynamic Algorithm Configuration

Automated algorithm configuration relieves users from tedious, trial-and...
research
11/07/2021

Coordinated Proximal Policy Optimization

We present Coordinated Proximal Policy Optimization (CoPPO), an algorith...

Please sign up or login with your details

Forgot password? Click here to reset