Decentralized Policy Optimization

11/06/2022
by   Kefan Su, et al.
0

The study of decentralized learning or independent learning in cooperative multi-agent reinforcement learning has a history of decades. Recently empirical studies show that independent PPO (IPPO) can obtain good performance, close to or even better than the methods of centralized training with decentralized execution, in several benchmarks. However, decentralized actor-critic with convergence guarantee is still open. In this paper, we propose decentralized policy optimization (DPO), a decentralized actor-critic algorithm with monotonic improvement and convergence guarantee. We derive a novel decentralized surrogate for policy optimization such that the monotonic improvement of joint policy can be guaranteed by each agent independently optimizing the surrogate. In practice, this decentralized surrogate can be realized by two adaptive coefficients for policy optimization at each agent. Empirically, we compare DPO with IPPO in a variety of cooperative multi-agent tasks, covering discrete and continuous action spaces, and fully and partially observable environments. The results show DPO outperforms IPPO in most tasks, which can be the evidence for our theoretical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2022

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

In cooperative multi-agent reinforcement learning (MARL), combining valu...
research
01/31/2022

Monotonic Improvement Guarantees under Non-stationarity for Decentralized PPO

We present a new monotonic improvement guarantee for optimizing decentra...
research
02/16/2023

Model-Based Decentralized Policy Optimization

Decentralized policy optimization has been commonly used in cooperative ...
research
10/22/2018

Multi-Agent Actor-Critic with Generative Cooperative Policy Network

We propose an efficient multi-agent reinforcement learning approach to d...
research
08/19/2022

Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games

This paper addresses policy learning in non-stationary environments and ...
research
06/02/2023

Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Executing actions in a correlated manner is a common strategy for human ...
research
06/27/2021

Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

Recent works have applied the Proximal Policy Optimization (PPO) to the ...

Please sign up or login with your details

Forgot password? Click here to reset