DQMIX: A Distributional Perspective on Multi-Agent Reinforcement Learning

02/21/2022
by   Jian Zhao, et al.
14

In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a team reward and observing the next state. During the interactions, the uncertainty of environment and reward will inevitably induce stochasticity in the long-term returns and the randomness can be exacerbated with the increasing number of agents. However, most of the existing value-based multi-agent reinforcement learning (MARL) methods only model the expectations of individual Q-values and global Q-value, ignoring such randomness. Compared to the expectations of the long-term returns, it is more preferable to directly model the stochasticity by estimating the returns through distributions. With this motivation, this work proposes DQMIX, a novel value-based MARL method, from a distributional perspective. Specifically, we model each individual Q-value with a categorical distribution. To integrate these individual Q-value distributions into the global Q-value distribution, we design a distribution mixing network, based on five basic operations on the distribution. We further prove that DQMIX satisfies the Distributional-Individual-Global-Max (DIGM) principle with respect to the expectation of distribution, which guarantees the consistency between joint and individual greedy action selections in the global Q-value and individual Q-values. To validate DQMIX, we demonstrate its ability to factorize a matrix game with stochastic rewards. Furthermore, the experimental results on a challenging set of StarCraft II micromanagement tasks show that DQMIX consistently outperforms the value-based multi-agent reinforcement learning baselines.

READ FULL TEXT
research
09/09/2020

QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the s...
research
10/27/2017

Distributional Reinforcement Learning with Quantile Regression

In reinforcement learning an agent interacts with the environment by tak...
research
03/03/2023

Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent Reinforcement Learning

The multi-agent setting is intricate and unpredictable since the behavio...
research
03/30/2018

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

In many real-world settings, a team of agents must coordinate their beha...
research
06/04/2023

A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning

In fully cooperative multi-agent reinforcement learning (MARL) settings,...
research
03/16/2022

CTDS: Centralized Teacher with Decentralized Student for Multi-Agent Reinforcement Learning

Due to the partial observability and communication constraints in many m...
research
02/16/2021

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

Current value-based multi-agent reinforcement learning methods optimize ...

Please sign up or login with your details

Forgot password? Click here to reset