Credit Assignment with Meta-Policy Gradient for Multi-Agent Reinforcement Learning

02/24/2021
by   Jianzhun Shao, et al.
0

Reward decomposition is a critical problem in centralized training with decentralized execution (CTDE) paradigm for multi-agent reinforcement learning. To take full advantage of global information, which exploits the states from all agents and the related environment for decomposing Q values into individual credits, we propose a general meta-learning-based Mixing Network with Meta Policy Gradient (MNMPG) framework to distill the global hierarchy for delicate reward decomposition. The excitation signal for learning global hierarchy is deduced from the episode reward difference between before and after "exercise updates" through the utility network. Our method is generally applicable to the CTDE method using a monotonic mixing network. Experiments on the StarCraft II micromanagement benchmark demonstrate that our method just with a simple utility network is able to outperform the current state-of-the-art MARL algorithms on 4 of 5 super hard scenarios. Better performance can be further achieved when combined with a role-based utility network.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2019

Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning

This paper proposes a definition of system health in the context of mult...
research
10/17/2022

PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning

Centralized Training with Decentralized Execution (CTDE) has been a very...
research
10/10/2022

Learning Credit Assignment for Cooperative Reinforcement Learning

Cooperative multi-agent policy gradient (MAPG) algorithms have recently ...
research
07/11/2019

Rethink Global Reward Game and Credit Assignment in Multi-agent Reinforcement Learning

Cooperative game is a critical research area in multi-agent reinforcemen...
research
03/16/2022

CTDS: Centralized Teacher with Decentralized Student for Multi-Agent Reinforcement Learning

Due to the partial observability and communication constraints in many m...
research
03/07/2019

Concurrent Meta Reinforcement Learning

State-of-the-art meta reinforcement learning algorithms typically assume...
research
02/10/2020

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Recently, deep multiagent reinforcement learning (MARL) has become a hig...

Please sign up or login with your details

Forgot password? Click here to reset