Model-Based Decentralized Policy Optimization

02/16/2023
by   Hao Luo, et al.
0

Decentralized policy optimization has been commonly used in cooperative multi-agent tasks. However, since all agents are updating their policies simultaneously, from the perspective of individual agents, the environment is non-stationary, resulting in it being hard to guarantee monotonic policy improvement. To help the policy improvement be stable and monotonic, we propose model-based decentralized policy optimization (MDPO), which incorporates a latent variable function to help construct the transition and reward function from an individual perspective. We theoretically analyze that the policy optimization of MDPO is more stable than model-free decentralized policy optimization. Moreover, due to non-stationarity, the latent variable function is varying and hard to be modeled. We further propose a latent variable prediction method to reduce the error of the latent variable function, which theoretically contributes to the monotonic policy improvement. Empirically, MDPO can indeed obtain superior performance than model-free decentralized policy optimization in a variety of cooperative multi-agent tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2022

Decentralized Policy Optimization

The study of decentralized learning or independent learning in cooperati...
research
02/02/2023

Best Possible Q-Learning

Fully decentralized learning, where the global information, i.e., the ac...
research
04/12/2023

Bi-level Latent Variable Model for Sample-Efficient Multi-Agent Reinforcement Learning

Despite their potential in real-world applications, multi-agent reinforc...
research
01/31/2022

Monotonic Improvement Guarantees under Non-stationarity for Decentralized PPO

We present a new monotonic improvement guarantee for optimizing decentra...
research
10/16/2019

MAVEN: Multi-Agent Variational Exploration

Centralised training with decentralised execution is an important settin...
research
11/28/2019

Option-critic in cooperative multi-agent systems

In this paper, we investigate learning temporal abstractions in cooperat...
research
08/04/2021

Model-Based Opponent Modeling

When one agent interacts with a multi-agent environment, it is challengi...

Please sign up or login with your details

Forgot password? Click here to reset