Permutation Invariant Policy Optimization for Mean-Field Multi-Agent Reinforcement Learning: A Principled Approach

05/18/2021
by   Yan Li, et al.
0

Multi-agent reinforcement learning (MARL) becomes more challenging in the presence of more agents, as the capacity of the joint state and action spaces grows exponentially in the number of agents. To address such a challenge of scale, we identify a class of cooperative MARL problems with permutation invariance, and formulate it as a mean-field Markov decision processes (MDP). To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture. We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence. Moreover, its sample complexity is independent of the number of agents. We validate the theoretical advantages of MF-PPO with numerical experiments in the multi-agent particle environment (MPE). In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors with a smaller number of model parameters, which is the key to its generalization performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2021

Many Agent Reinforcement Learning Under Partial Observability

Recent renewed interest in multi-agent reinforcement learning (MARL) has...
research
12/14/2019

Natural Actor-Critic Converges Globally for Hierarchical Linear Quadratic Regulator

Multi-agent reinforcement learning has been successfully applied to a nu...
research
10/31/2019

PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning

Sample efficiency and scalability to a large number of agents are two im...
research
02/24/2023

Permutation-Invariant Set Autoencoders with Fixed-Size Embeddings for Multi-Agent Learning

The problem of permutation-invariant learning over set representations i...
research
09/09/2021

On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)

Mean field control (MFC) is an effective way to mitigate the curse of di...
research
09/14/2021

WaveCorr: Correlation-savvy Deep Reinforcement Learning for Portfolio Management

The problem of portfolio management represents an important and challeng...
research
08/05/2021

Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach

One of the challenges for multi-agent reinforcement learning (MARL) is d...

Please sign up or login with your details

Forgot password? Click here to reset