Learning Credit Assignment for Cooperative Reinforcement Learning

10/10/2022
by   Wubing Chen, et al.
2

Cooperative multi-agent policy gradient (MAPG) algorithms have recently attracted wide attention and are regarded as a general scheme for the multi-agent system. Credit assignment plays an important role in MAPG and can induce cooperation among multiple agents. However, most MAPG algorithms cannot achieve good credit assignment because of the game-theoretic pathology known as centralized-decentralized mismatch. To address this issue, this paper presents a novel method, Multi-Agent Polarization Policy Gradient (MAPPG). MAPPG takes a simple but efficient polarization function to transform the optimal consistency of joint and individual actions into easily realized constraints, thus enabling efficient credit assignment in MAPG. Theoretically, we prove that individual policies of MAPPG can converge to the global optimum. Empirically, we evaluate MAPPG on the well-known matrix game and differential game, and verify that MAPPG can converge to the global optimum for both discrete and continuous action spaces. We also evaluate MAPPG on a set of StarCraft II micromanagement tasks and demonstrate that MAPPG outperforms the state-of-the-art MAPG algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2021

Learning Cooperative Multi-Agent Policies with Partial Reward Decoupling

One of the preeminent obstacles to scaling multi-agent reinforcement lea...
research
06/01/2021

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Centralized Training with Decentralized Execution (CTDE) has been a popu...
research
02/09/2022

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

In cooperative multi-agent systems, agents jointly take actions and rece...
research
10/17/2022

Multi-Agent Automated Machine Learning

In this paper, we propose multi-agent automated machine learning (MA2ML)...
research
05/20/2022

Decentralized Autonomous Organizations for Tax Credit's Tracking

Tax credit stimulus and fiscal bonuses had a very important impact on It...
research
02/24/2021

Credit Assignment with Meta-Policy Gradient for Multi-Agent Reinforcement Learning

Reward decomposition is a critical problem in centralized training with ...
research
06/02/2022

RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in Multi-Agent Deep Reinforcement Learning

In recent years, reinforcement learning has faced several challenges in ...

Please sign up or login with your details

Forgot password? Click here to reset