Centralized Model and Exploration Policy for Multi-Agent RL

07/14/2021
by   Qizhen Zhang, et al.
0

Reinforcement learning (RL) in partially observable, fully cooperative multi-agent settings (Dec-POMDPs) can in principle be used to address many real-world challenges such as controlling a swarm of rescue robots or a synchronous team of quadcopters. However, Dec-POMDPs are significantly harder to solve than single-agent problems, with the former being NEXP-complete and the latter, MDPs, being just P-complete. Hence, current RL algorithms for Dec-POMDPs suffer from poor sample complexity, thereby reducing their applicability to practical problems where environment interaction is costly. Our key insight is that using just a polynomial number of samples, one can learn a centralized model that generalizes across different policies. We can then optimize the policy within the learned model instead of the true system, reducing the number of environment interactions. We also learn a centralized exploration policy within our model that learns to collect additional data in state-action regions with high model uncertainty. Finally, we empirically evaluate the proposed model-based algorithm, MARCO, in three cooperative communication tasks, where it improves sample efficiency by up to 20x.

READ FULL TEXT
research
10/03/2019

Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics

Many real world tasks require multiple agents to work together. Multi-ag...
research
03/05/2021

MAMBPO: Sample-efficient multi-robot reinforcement learning using learned world models

Multi-robot systems can benefit from reinforcement learning (RL) algorit...
research
07/25/2018

Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches

Reinforcement Learning (RL) is a learning paradigm concerned with learni...
research
11/18/2019

Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning

Existing value-factorized based Multi-Agent deep Reinforce-ment Learning...
research
11/13/2018

Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG

Modelling and exploiting teammates' policies in cooperative multi-agent ...
research
02/26/2022

A Scalable Graph-Theoretic Distributed Framework for Cooperative Multi-Agent Reinforcement Learning

The main challenge of large-scale cooperative multi-agent reinforcement ...
research
01/31/2012

Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-like Exploration

We present an implementation of model-based online reinforcement learnin...

Please sign up or login with your details

Forgot password? Click here to reset