Dealing with Non-Stationarity in Multi-Agent Reinforcement Learning via Trust Region Decomposition

02/21/2021
by   Wenhao Li, et al.
0

Non-stationarity is one thorny issue in multi-agent reinforcement learning, which is caused by the policy changes of agents during the learning procedure. Current works to solve this problem have their own limitations in effectiveness and scalability, such as centralized critic and decentralized actor (CCDA), population-based self-play, modeling of others and etc. In this paper, we novelly introduce a δ-stationarity measurement to explicitly model the stationarity of a policy sequence, which is theoretically proved to be proportional to the joint policy divergence. However, simple policy factorization like mean-field approximation will mislead to larger policy divergence, which can be considered as trust region decomposition dilemma. We model the joint policy as a general Markov random field and propose a trust region decomposition network based on message passing to estimate the joint policy divergence more accurately. The Multi-Agent Mirror descent policy algorithm with Trust region decomposition, called MAMT, is established with the purpose to satisfy δ-stationarity. MAMT can adjust the trust region of the local policies adaptively in an end-to-end manner, thereby approximately constraining the divergence of joint policy to alleviate the non-stationary problem. Our method can bring noticeable and stable performance improvement compared with baselines in coordination tasks of different complexity.

READ FULL TEXT

page 12

page 27

page 30

research
09/26/2022

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

In cooperative multi-agent reinforcement learning (MARL), combining valu...
research
10/15/2020

Multi-Agent Trust Region Policy Optimization

We extend trust region policy optimization (TRPO) to multi-agent reinfor...
research
08/13/2023

Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent Policy Optimization

This paper presents an extension of the Mirror Descent method to overcom...
research
01/31/2022

Monotonic Improvement Guarantees under Non-stationarity for Decentralized PPO

We present a new monotonic improvement guarantee for optimizing decentra...
research
04/19/2020

Intention Propagation for Multi-agent Reinforcement Learning

A hallmark of an AI agent is to mimic human beings to understand and int...
research
07/05/2023

Multi-Agent Cooperation via Unsupervised Learning of Joint Intentions

The field of cooperative multi-agent reinforcement learning (MARL) has s...
research
05/10/2023

Fast Teammate Adaptation in the Presence of Sudden Policy Change

In cooperative multi-agent reinforcement learning (MARL), where an agent...

Please sign up or login with your details

Forgot password? Click here to reset