Regularized Softmax Deep Multi-Agent Q-Learning

03/22/2021
by   Ling Pan, et al.
0

Tackling overestimation in Q-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multi-agent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multi-agent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q-Learning, is general and can be applied to any Q-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2022

Curriculum Learning for Relative Overgeneralization

In multi-agent reinforcement learning (MARL), many popular methods, such...
research
08/16/2020

The reinforcement learning-based multi-agent cooperative approach for the adaptive speed regulation on a metallurgical pickling line

We present a holistic data-driven approach to the problem of productivit...
research
10/01/2021

Divergence-Regularized Multi-Agent Actor-Critic

Entropy regularization is a popular method in reinforcement learning (RL...
research
09/19/2023

Multicopy Reinforcement Learning Agents

This paper examines a novel type of multi-agent problem, in which an age...
research
05/17/2019

A Regularized Opponent Model with Maximum Entropy Objective

In a single-agent setting, reinforcement learning (RL) tasks can be cast...
research
02/23/2023

Revisiting the Gumbel-Softmax in MADDPG

MADDPG is an algorithm in multi-agent reinforcement learning (MARL) that...
research
12/13/2017

Multi-focus Attention Network for Efficient Deep Reinforcement Learning

Deep reinforcement learning (DRL) has shown incredible performance in le...

Please sign up or login with your details

Forgot password? Click here to reset