Towards Global Optimality in Cooperative MARL with Sequential Transformation

07/12/2022
by   Jianing Ye, et al.
0

Policy learning in multi-agent reinforcement learning (MARL) is challenging due to the exponential growth of joint state-action space with respect to the number of agents. To achieve higher scalability, the paradigm of centralized training with decentralized execution (CTDE) is broadly adopted with factorized structure in MARL. However, we observe that existing CTDE algorithms in cooperative MARL cannot achieve optimality even in simple matrix games. To understand this phenomenon, we introduce a framework of Generalized Multi-Agent Actor-Critic with Policy Factorization (GPF-MAC), which characterizes the learning of factorized joint policies, i.e., each agent's policy only depends on its own observation-action history. We show that most popular CTDE MARL algorithms are special instances of GPF-MAC and may be stuck in a suboptimal joint policy. To address this issue, we present a novel transformation framework that reformulates a multi-agent MDP as a special "single-agent" MDP with a sequential structure and can allow employing off-the-shelf single-agent reinforcement learning (SARL) algorithms to efficiently learn corresponding multi-agent tasks. This transformation retains the optimality guarantee of SARL algorithms into cooperative MARL. To instantiate this transformation framework, we propose a Transformed PPO, called T-PPO, which can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks.

READ FULL TEXT
research
09/26/2022

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

In cooperative multi-agent reinforcement learning (MARL), combining valu...
research
10/09/2021

Multi-Agent MDP Homomorphic Networks

This paper introduces Multi-Agent MDP Homomorphic Networks, a class of n...
research
11/29/2015

Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version)

In cooperative multi-agent sequential decision making under uncertainty,...
research
06/02/2023

Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Executing actions in a correlated manner is a common strategy for human ...
research
09/28/2022

Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning

Equilibrium selection in multi-agent games refers to the problem of sele...
research
11/13/2018

Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG

Modelling and exploiting teammates' policies in cooperative multi-agent ...
research
12/05/2018

Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration

Many reality tasks such as robot coordination can be naturally modelled ...

Please sign up or login with your details

Forgot password? Click here to reset