Semi-Centralised Multi-Agent Reinforcement Learning with Policy-Embedded Training

09/02/2022
by   Taher Jafferjee, et al.
7

Centralised training (CT) is the basis for many popular multi-agent reinforcement learning (MARL) methods because it allows agents to quickly learn high-performing policies. However, CT relies on agents learning from one-off observations of other agents' actions at a given state. Because MARL agents explore and update their policies during training, these observations often provide poor predictions about other agents' behaviour and the expected return for a given action. CT methods therefore suffer from high variance and error-prone estimates, harming learning. CT methods also suffer from explosive growth in complexity due to the reliance on global observations, unless strong factorisation restrictions are imposed (e.g., monotonic reward functions for QMIX). We address these challenges with a new semi-centralised MARL framework that performs policy-embedded training and decentralised execution. Our method, policy embedded reinforcement learning algorithm (PERLA), is an enhancement tool for Actor-Critic MARL algorithms that leverages a novel parameter sharing protocol and policy embedding method to maintain estimates that account for other agents' behaviour. Our theory proves PERLA dramatically reduces the variance in value estimates. Unlike various CT methods, PERLA, which seamlessly adopts MARL algorithms, scales easily with the number of agents without the need for restrictive factorisation assumptions. We demonstrate PERLA's superior empirical performance and efficient scaling in benchmark environments including StarCraft Micromanagement II and Multi-agent Mujoco

READ FULL TEXT

page 7

page 14

research
10/01/2017

Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning

Deep reinforcement learning for multi-agent cooperation and competition ...
research
05/24/2017

Counterfactual Multi-Agent Policy Gradients

Cooperative multi-agent systems can be naturally used to model many real...
research
02/18/2022

DARL1N: Distributed multi-Agent Reinforcement Learning with One-hop Neighbors

Most existing multi-agent reinforcement learning (MARL) methods are limi...
research
03/30/2018

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

In many real-world settings, a team of agents must coordinate their beha...
research
12/02/2020

Policy Supervectors: General Characterization of Agents by their Behaviour

By studying the underlying policies of decision-making agents, we can le...
research
01/21/2022

Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

The increased complexity of state-of-the-art reinforcement learning (RL)...
research
08/18/2020

Ubiquitous Distributed Deep Reinforcement Learning at the Edge: Analyzing Byzantine Agents in Discrete Action Spaces

The integration of edge computing in next-generation mobile networks is ...

Please sign up or login with your details

Forgot password? Click here to reset