Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

02/10/2020
by   Yaodong Yang, et al.
0

Recently, deep multiagent reinforcement learning (MARL) has become a highly active research area as many real-world problems can be inherently viewed as multiagent systems. A particularly interesting and widely applicable class of problems is the partially observable cooperative multiagent setting, in which a team of agents learns to coordinate their behaviors conditioning on their private observations and commonly shared global reward signals. One natural solution is to resort to the centralized training and decentralized execution paradigm. During centralized training, one key challenge is the multiagent credit assignment: how to allocate the global rewards for individual agent policies for better coordination towards maximizing system-level's benefits. In this paper, we propose a new method called Q-value Path Decomposition (QPD) to decompose the system's global Q-values into individual agents' Q-values. Unlike previous works which restrict the representation relation of the individual Q-values and the global one, we leverage the integrated gradient attribution technique into deep MARL to directly decompose global Q-values along trajectory paths to assign credits for agents. We evaluate QPD on the challenging StarCraft II micromanagement tasks and show that QPD achieves the state-of-the-art performance in both homogeneous and heterogeneous multiagent scenarios compared with existing cooperative MARL algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2021

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Centralized Training with Decentralized Execution (CTDE) has been a popu...
research
02/10/2020

Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

In many real-world settings, a team of cooperative agents must learn to ...
research
05/25/2022

Scalable Multi-Agent Model-Based Reinforcement Learning

Recent Multi-Agent Reinforcement Learning (MARL) literature has been lar...
research
09/10/2019

Signal Instructed Coordination in Cooperative Multi-agent Reinforcement Learning

In many real-world problems, a team of agents need to collaborate to max...
research
02/16/2021

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

Current value-based multi-agent reinforcement learning methods optimize ...
research
02/24/2021

Credit Assignment with Meta-Policy Gradient for Multi-Agent Reinforcement Learning

Reward decomposition is a critical problem in centralized training with ...
research
12/27/2021

Multiagent Model-based Credit Assignment for Continuous Control

Deep reinforcement learning (RL) has recently shown great promise in rob...

Please sign up or login with your details

Forgot password? Click here to reset