MAVEN: Multi-Agent Variational Exploration

10/16/2019
by   Anuj Mahajan, et al.
17

Centralised training with decentralised execution is an important setting for cooperative deep multi-agent reinforcement learning due to communication constraints during execution and computational tractability in training. In this paper, we analyse value-based methods that are known to have superior performance in complex environments [43]. We specifically focus on QMIX [40], the current state-of-the-art in this domain. We show that the representational constraints on the joint action-values introduced by QMIX and similar methods lead to provably poor exploration and suboptimality. Furthermore, we propose a novel approach called MAVEN that hybridises value and policy-based methods by introducing a latent space for hierarchical control. The value-based agents condition their behaviour on the shared latent variable controlled by a hierarchical policy. This allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks. Our experimental results show that MAVEN achieves significant performance improvements on the challenging SMAC domain [43].

READ FULL TEXT
research
02/10/2021

Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning

Value-based methods of multi-agent reinforcement learning (MARL), especi...
research
03/16/2023

Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning

Efficient exploration is critical in cooperative deep Multi-Agent Reinfo...
research
12/27/2022

Strangeness-driven Exploration in Multi-Agent Reinforcement Learning

Efficient exploration strategy is one of essential issues in cooperative...
research
07/05/2023

Multi-Agent Cooperation via Unsupervised Learning of Joint Intentions

The field of cooperative multi-agent reinforcement learning (MARL) has s...
research
09/19/2021

Regularize! Don't Mix: Multi-Agent Reinforcement Learning without Explicit Centralized Structures

We propose using regularization for Multi-Agent Reinforcement Learning r...
research
02/16/2023

Model-Based Decentralized Policy Optimization

Decentralized policy optimization has been commonly used in cooperative ...
research
12/14/2022

Hierarchical Strategies for Cooperative Multi-Agent Reinforcement Learning

Adequate strategizing of agents behaviors is essential to solving cooper...

Please sign up or login with your details

Forgot password? Click here to reset