Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

02/09/2022
by   Jian Zhao, et al.
11

In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards. In the absence of individual reward signals, credit assignment mechanisms are usually introduced to discriminate the contributions of different agents so as to achieve effective cooperation. Recently, the value decomposition paradigm has been widely adopted to realize credit assignment, and QMIX has become the state-of-the-art solution. In this paper, we revisit QMIX from two aspects. First, we propose a new perspective on credit assignment measurement and empirically show that QMIX suffers limited discriminability on the assignment of credits to agents. Second, we propose a gradient entropy regularization with QMIX to realize a discriminative credit assignment, thereby improving the overall performance. The experiments demonstrate that our approach can comparatively improve learning efficiency and achieve better performance.

READ FULL TEXT
research
06/01/2021

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Centralized Training with Decentralized Execution (CTDE) has been a popu...
research
10/10/2022

Learning Credit Assignment for Cooperative Reinforcement Learning

Cooperative multi-agent policy gradient (MAPG) algorithms have recently ...
research
08/02/2019

Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning

This paper proposes a definition of system health in the context of mult...
research
07/06/2020

Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning

We present a multi-agent actor-critic method that aims to implicitly add...
research
11/23/2022

Contrastive Identity-Aware Learning for Multi-Agent Value Decomposition

Value Decomposition (VD) aims to deduce the contributions of agents for ...
research
04/29/2018

From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction

In this work, we study the credit assignment problem in reward augmented...
research
06/20/2023

Discovering Causality for Efficient Cooperation in Multi-Agent Environments

In cooperative Multi-Agent Reinforcement Learning (MARL) agents are requ...

Please sign up or login with your details

Forgot password? Click here to reset