Correcting Experience Replay for Multi-Agent Communication

by   Sanjeevan Ahilan, et al.

We consider the problem of learning to communicate using multi-agent reinforcement learning (MARL). A common approach is to learn off-policy, using data sampled from a replay buffer. However, messages received in the past may not accurately reflect the current communication policy of each agent, and this complicates learning. We therefore introduce a 'communication correction' which accounts for the non-stationarity of observed communication induced by multi-agent learning. It works by relabelling the received message to make it likely under the communicator's current policy, and thus be a better reflection of the receiver's current environment. To account for cases in which agents are both senders and receivers, we introduce an ordered relabelling scheme. Our correction is computationally efficient and can be integrated with a range of off-policy algorithms. It substantially improves the ability of communicating MARL systems to learn across a variety of cooperative and competitive tasks.



page 7


Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy

Experience replay (ER) improves the data efficiency of off-policy reinfo...

Minimizing Communication while Maximizing Performance in Multi-Agent Reinforcement Learning

Inter-agent communication can significantly increase performance in mult...

MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer

In this paper, we consider cooperative multi-agent reinforcement learnin...

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Many real-world problems, such as network packet routing and urban traff...

Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks

Learning when to communicate and doing that effectively is essential in ...

Connectivity-driven Communication in Multi-agent Reinforcement Learning through Diffusion Processes on Graphs

We discuss the problem of learning collaborative behaviour in multi-agen...

Modeling Social Group Communication with Multi-Agent Imitation Learning

In crowded social scenarios with a myriad of external stimuli, human bra...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.