Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG

11/13/2018
by   Hangyu Mao, et al.
0

Modelling and exploiting teammates' policies in cooperative multi-agent systems have long been an interest and also a big challenge for the reinforcement learning (RL) community. The interest lies in the fact that if the agent knows the teammates' policies, it can adjust its own policy accordingly to arrive at proper cooperations; while the challenge is that the agents' policies are changing continuously due to they are learning concurrently, which imposes difficulty to model the dynamic policies of teammates accurately. In this paper, we present ATTention Multi-Agent Deep Deterministic Policy Gradient (ATT-MADDPG) to address this challenge. ATT-MADDPG extends DDPG, a single-agent actor-critic RL method, with two special designs. First, in order to model the teammates' policies, the agent should get access to the observations and actions of teammates. ATT-MADDPG adopts a centralized critic to collect such information. Second, to model the teammates' policies using the collected information in an effective way, ATT-MADDPG enhances the centralized critic with an attention mechanism. This attention mechanism introduces a special structure to explicitly model the dynamic joint policy of teammates, making sure that the collected information can be processed efficiently. We evaluate ATT-MADDPG on both benchmark tasks and the real-world packet routing tasks. Experimental results show that it not only outperforms the state-of-the-art RL-based methods and rule-based methods by a large margin, but also achieves better performance in terms of scalability and robustness.

READ FULL TEXT

page 6

page 11

research
10/05/2018

Actor-Attention-Critic for Multi-Agent Reinforcement Learning

Reinforcement learning in multi-agent scenarios is important for real-wo...
research
12/27/2021

A Graph Attention Learning Approach to Antenna Tilt Optimization

6G will move mobile networks towards increasing levels of complexity. To...
research
07/12/2022

Towards Global Optimality in Cooperative MARL with Sequential Transformation

Policy learning in multi-agent reinforcement learning (MARL) is challeng...
research
10/02/2021

AB-Mapper: Attention and BicNet Based Multi-agent Path Finding for Dynamic Crowded Environment

Multi-agent path finding in dynamic crowded environments is of great aca...
research
06/27/2021

Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

Recent works have applied the Proximal Policy Optimization (PPO) to the ...
research
06/16/2020

Local Information Opponent Modelling Using Variational Autoencoders

Modelling the behaviours of other agents (opponents) is essential for un...
research
07/14/2021

Centralized Model and Exploration Policy for Multi-Agent RL

Reinforcement learning (RL) in partially observable, fully cooperative m...

Please sign up or login with your details

Forgot password? Click here to reset