-
QOPT: Optimistic Value Function Decentralization for Cooperative Multi-Agent Reinforcement Learning
We propose a novel value-based algorithm for cooperative multi-agent rei...
read it
-
UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning
This paper focuses on cooperative value-based multi-agent reinforcement ...
read it
-
Prioritized Guidance for Efficient Multi-Agent Reinforcement Learning Exploration
Exploration efficiency is a challenging problem in multi-agent reinforce...
read it
-
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
In many real-world settings, a team of agents must coordinate their beha...
read it
-
QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning
We explore value-based solutions for multi-agent reinforcement learning ...
read it
-
Multi-Agent Reinforcement Learning with Emergent Roles
The role concept provides a useful tool to design and understand complex...
read it
-
ROMA: Multi-Agent Reinforcement Learning with Emergent Roles
The role concept provides a useful tool to design and understand complex...
read it
MAVEN: Multi-Agent Variational Exploration
Centralised training with decentralised execution is an important setting for cooperative deep multi-agent reinforcement learning due to communication constraints during execution and computational tractability in training. In this paper, we analyse value-based methods that are known to have superior performance in complex environments [43]. We specifically focus on QMIX [40], the current state-of-the-art in this domain. We show that the representational constraints on the joint action-values introduced by QMIX and similar methods lead to provably poor exploration and suboptimality. Furthermore, we propose a novel approach called MAVEN that hybridises value and policy-based methods by introducing a latent space for hierarchical control. The value-based agents condition their behaviour on the shared latent variable controlled by a hierarchical policy. This allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks. Our experimental results show that MAVEN achieves significant performance improvements on the challenging SMAC domain [43].
READ FULL TEXT
Comments
There are no comments yet.