MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization

02/21/2023
by   Yongsheng Mei, et al.
4

Experience replay is crucial for off-policy reinforcement learning (RL) methods. By remembering and reusing the experiences from past different policies, experience replay significantly improves the training efficiency and stability of RL algorithms. Many decision-making problems in practice naturally involve multiple agents and require multi-agent reinforcement learning (MARL) under centralized training decentralized execution paradigm. Nevertheless, existing MARL algorithms often adopt standard experience replay where the transitions are uniformly sampled regardless of their importance. Finding prioritized sampling weights that are optimized for MARL experience replay has yet to be explored. To this end, we propose MAC-PO, which formulates optimal prioritized experience replay for multi-agent problems as a regret minimization over the sampling weights of transitions. Such optimization is relaxed and solved using the Lagrangian multiplier approach to obtain the close-form optimal sampling weights. By minimizing the resulting policy regret, we can narrow the gap between the current policy and a nominal optimal policy, thus acquiring an improved prioritization scheme for multi-agent tasks. Our experimental results on Predator-Prey and StarCraft Multi-Agent Challenge environments demonstrate the effectiveness of our method, having a better ability to replay important transitions and outperforming other state-of-the-art baselines.

READ FULL TEXT
research
05/31/2023

AccMER: Accelerating Multi-Agent Experience Replay with Cache Locality-aware Prioritization

Multi-Agent Experience Replay (MER) is a key component of off-policy rei...
research
02/28/2017

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Many real-world problems, such as network packet routing and urban traff...
research
10/02/2020

Correcting Experience Replay for Multi-Agent Communication

We consider the problem of learning to communicate using multi-agent rei...
research
10/08/2020

Prioritized Level Replay

Simulated environments with procedurally generated content have become p...
research
01/25/2023

Discriminative Experience Replay for Efficient Multi-agent Reinforcement Learning

In cooperative multi-agent tasks, parameter sharing among agents is a co...
research
05/15/2021

Regret Minimization Experience Replay

Experience replay is widely used in various deep off-policy reinforcemen...
research
12/08/2021

Replay For Safety

Experience replay <cit.> is a widely used technique to achieve efficient...

Please sign up or login with your details

Forgot password? Click here to reset