QOPT: Optimistic Value Function Decentralization for Cooperative Multi-Agent Reinforcement Learning

06/22/2020
by   Kyunghwan Son, et al.
8

We propose a novel value-based algorithm for cooperative multi-agent reinforcement learning, under the paradigm of centralized training with decentralized execution. The proposed algorithm, coined QOPT, is based on the "optimistic" training scheme using two action-value estimators with separate roles: (i) true action-value estimation and (ii) decentralization of optimal action. By construction, our framework allows the latter action-value estimator to achieve (ii) while representing a richer class of joint action-value estimators than that of the state-of-the-art algorithm, i.e., QMIX. Our experiments demonstrate that QOPT newly achieves state-of-the-art performance in the StarCraft Multi-Agent Challenge environment. In particular, ours significantly outperform the baselines for the case where non-cooperative behaviors are penalized more aggressively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2022

Transformer-based Value Function Decomposition for Cooperative Multi-agent Reinforcement Learning in StarCraft

The StarCraft II Multi-Agent Challenge (SMAC) was created to be a challe...
research
05/14/2019

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning

We explore value-based solutions for multi-agent reinforcement learning ...
research
01/18/2021

Cooperative and Competitive Biases for Multi-Agent Reinforcement Learning

Training a multi-agent reinforcement learning (MARL) algorithm is more c...
research
10/09/2020

Graph Convolutional Value Decomposition in Multi-Agent Reinforcement Learning

We propose a novel framework for value function factorization in multi-a...
research
10/06/2020

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

This paper focuses on cooperative value-based multi-agent reinforcement ...
research
12/22/2020

QVMix and QVMix-Max: Extending the Deep Quality-Value Family of Algorithms to Cooperative Multi-Agent Reinforcement Learning

This paper introduces four new algorithms that can be used for tackling ...
research
12/09/2021

Value Function Factorisation with Hypergraph Convolution for Cooperative Multi-agent Reinforcement Learning

Cooperation between agents in a multi-agent system (MAS) has become a ho...

Please sign up or login with your details

Forgot password? Click here to reset