QPLEX: Duplex Dueling Multi-Agent Q-Learning

08/03/2020
by   Jianhao Wang, et al.
0

We explore value-based multi-agent reinforcement learning (MARL) in the popular paradigm of centralized training with decentralized execution (CTDE). CTDE has an important concept, Individual-Global-Max (IGM) principle, which requires the consistency between joint and local action selections to support efficient local decision-making. However, in order to achieve scalability, existing MARL methods either limit representation expressiveness of their value function classes or relax the IGM consistency, which may suffer from instability risk or lead to poor performance. This paper presents a novel MARL approach, called duPLEX dueling multi-agent Q-learning (QPLEX), which takes a duplex dueling network architecture to factorize the joint value function. This duplex dueling structure encodes the IGM principle into the neural network architecture and thus enables efficient value function learning. Theoretical analysis shows that QPLEX achieves a complete IGM function class. Empirical experiments on StarCraft II micromanagement tasks demonstrate that QPLEX significantly outperforms state-of-the-art baselines in both online and offline data collection settings, and also reveal that QPLEX achieves high sample efficiency and can benefit from offline datasets without additional online exploration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2020

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

This paper focuses on cooperative value-based multi-agent reinforcement ...
research
12/08/2021

Greedy-based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

Due to the representation limitation of the joint Q value function, mult...
research
08/17/2022

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

Goal-conditioned reinforcement learning (GCRL) has a wide range of poten...
research
11/13/2022

CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification

Data valuation, or the valuation of individual datum contributions, has ...
research
12/16/2021

Centralizing State-Values in Dueling Networks for Multi-Robot Reinforcement Learning Mapless Navigation

We study the problem of multi-robot mapless navigation in the popular Ce...
research
06/15/2023

Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization

Offline reinforcement learning (RL) that learns policies from offline da...
research
11/11/2019

SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

Learning a stable and generalizable centralized value function (CVF) is ...

Please sign up or login with your details

Forgot password? Click here to reset