Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

06/27/2021
by   Jian Hu, et al.
0

Recent works have applied the Proximal Policy Optimization (PPO) to the multi-agent cooperative tasks, such as Independent PPO (IPPO); and vanilla Multi-agent PPO (MAPPO) which has a centralized value function. However, previous literature shows that MAPPO may not perform as well as Independent PPO (IPPO) and the Fine-tuned QMIX on Starcraft Multi-Agent Challenge (SMAC). MAPPO-Feature-Pruned (MAPPO-FP) improves the performance of MAPPO by the carefully designed agent-specific features, which may be not friendly to algorithmic utility. By contrast, we find that MAPPO may face the problem of The Policies Overfitting in Multi-agent Cooperation(POMAC), as they learn policies by the sampled advantage values. Then POMAC may lead to updating the multi-agent policies in a suboptimal direction and prevent the agents from exploring better trajectories. In this paper, to mitigate the multi-agent policies overfitting, we propose a novel policy regularization method, which disturbs the advantage values via random Gaussian noise. The experimental results show that our method outperforms the Fine-tuned QMIX, MAPPO-FP, and achieves SOTA on SMAC without agent-specific features. We open-source the code at <https://github.com/hijkzzz/noisy-mappo>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2021

MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

This work considers the problem of learning cooperative policies in mult...
research
10/01/2021

Divergence-Regularized Multi-Agent Actor-Critic

Entropy regularization is a popular method in reinforcement learning (RL...
research
02/10/2021

Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning

Value-based methods of multi-agent reinforcement learning (MARL), especi...
research
11/13/2018

Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG

Modelling and exploiting teammates' policies in cooperative multi-agent ...
research
11/06/2022

Decentralized Policy Optimization

The study of decentralized learning or independent learning in cooperati...
research
12/05/2018

Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration

Many reality tasks such as robot coordination can be naturally modelled ...
research
08/23/2023

E(3)-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Identification and analysis of symmetrical patterns in the natural world...

Please sign up or login with your details

Forgot password? Click here to reset