Greedy-based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

12/08/2021
by   Lipeng Wan, et al.
0

Due to the representation limitation of the joint Q value function, multi-agent reinforcement learning (MARL) methods with linear or monotonic value decomposition suffer from the relative overgeneralization. As a result, they can not ensure the optimal coordination. Existing methods address the relative overgeneralization by achieving complete expressiveness or learning a bias, which is insufficient to solve the problem. In this paper, we propose the optimal consistency, a criterion to evaluate the optimality of coordination. To achieve the optimal consistency, we introduce the True-Global-Max (TGM) principle for linear and monotonic value decomposition, where the TGM principle can be ensured when the optimal stable point is the unique stable point. Therefore, we propose the greedy-based value representation (GVR) to ensure the optimal stable point via inferior target shaping and eliminate the non-optimal stable points via superior experience replay. Theoretical proofs and empirical results demonstrate that our method can ensure the optimal consistency under sufficient exploration. In experiments on various benchmarks, GVR significantly outperforms state-of-the-art baselines.

READ FULL TEXT
research
08/03/2020

QPLEX: Duplex Dueling Multi-Agent Q-Learning

We explore value-based multi-agent reinforcement learning (MARL) in the ...
research
02/14/2023

Adaptive Value Decomposition with Greedy Marginal Contribution Computation for Cooperative Multi-Agent Reinforcement Learning

Real-world cooperation often requires intensive coordination among agent...
research
03/19/2020

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

In many real-world settings, a team of agents must coordinate its behavi...
research
05/12/2023

Boosting Value Decomposition via Unit-Wise Attentive State Representation for Cooperative Multi-Agent Reinforcement Learning

In cooperative multi-agent reinforcement learning (MARL), the environmen...
research
09/27/2019

Deep Coordination Graphs

This paper introduces the deep coordination graph (DCG) for collaborativ...
research
07/05/2023

Multi-Agent Cooperation via Unsupervised Learning of Joint Intentions

The field of cooperative multi-agent reinforcement learning (MARL) has s...
research
08/07/2022

Maximum Correntropy Value Decomposition for Multi-agent Deep Reinforcemen Learning

We explore value decomposition solutions for multi-agent deep reinforcem...

Please sign up or login with your details

Forgot password? Click here to reset