Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning

by   Raphaël Avalos, et al.

Multi-agent reinforcement learning (MARL) enables us to create adaptive agents in challenging environments, even when the agents have limited observation. Modern MARL methods have hitherto focused on finding factorized value functions. While this approach has proven successful, the resulting methods have convoluted network structures. We take a radically different approach, and build on the structure of independent Q-learners. Inspired by influence-based abstraction, we start from the observation that compact representations of the observation-action histories can be sufficient to learn close to optimal decentralized policies. Combining this observation with a dueling architecture, our algorithm, LAN, represents these policies as separate individual advantage functions w.r.t. a centralized critic. These local advantage networks condition only on a single agent's local observation-action history. The centralized value function conditions on the agents' representations as well as the full state of the environment. The value function, which is cast aside before execution, serves as a stabilizer that coordinates the learning and to formulate DQN targets during learning. In contrast with other methods, this enables LAN to keep the number of network parameters of its centralized network independent in the number of agents, without imposing additional constraints like monotonic value functions. When evaluated on the StarCraft multi-agent challenge benchmark, LAN shows state-of-the-art performance and scores more than 80 unsolved maps `corridor' and `3s5z_vs_3s6z', leading to an improvement of 10 over QPLEX on average performance on the 14 maps. Moreover when the number of agents becomes large, LAN uses significantly fewer parameters than QPLEX or even QMIX. We thus show that LAN's structure forms a key improvement that helps MARL methods remain scalable.


page 8

page 16


UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

This paper focuses on cooperative value-based multi-agent reinforcement ...

Residual Q-Networks for Value Function Factorizing in Multi-Agent Reinforcement Learning

Multi-Agent Reinforcement Learning (MARL) is useful in many problems tha...

CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning

We propose CM3, a new deep reinforcement learning method for cooperative...

The challenge of redundancy on multi-agent value factorisation

In the field of cooperative multi-agent reinforcement learning (MARL), t...

QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the s...

SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

Learning a stable and generalizable centralized value function (CVF) is ...

S2RL: Do We Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

Collaborative multi-agent reinforcement learning (MARL) has been widely ...

Please sign up or login with your details

Forgot password? Click here to reset