Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

05/08/2023
by   Yulai Zhao, et al.
0

Policy optimization methods with function approximation are widely used in multi-agent reinforcement learning. However, it remains elusive how to design such algorithms with statistical guarantees. Leveraging a multi-agent performance difference lemma that characterizes the landscape of multi-agent policy optimization, we find that the localized action value function serves as an ideal descent direction for each local policy. Motivated by the observation, we present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO. We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate. We extend our algorithm to the off-policy setting and introduce pessimism to policy evaluation, which aligns with experiments. To our knowledge, this is the first provably convergent multi-agent PPO algorithm in cooperative Markov games.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2022

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

We study a multi-agent reinforcement learning (MARL) problem where the a...
research
03/22/2021

Reward-Reinforced Reinforcement Learning for Multi-agent Systems

Reinforcement learning algorithms in multi-agent systems deliver highly ...
research
03/02/2021

The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Proximal Policy Optimization (PPO) is a popular on-policy reinforcement ...
research
01/20/2021

UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers

Recent advances in multi-agent reinforcement learning have been largely ...
research
07/30/2023

Robust Multi-Agent Reinforcement Learning with State Uncertainty

In real-world multi-agent reinforcement learning (MARL) applications, ag...
research
12/30/2021

Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation

Distributed Multi-Agent Reinforcement Learning (MARL) algorithms has att...
research
06/08/2021

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Bid optimization for online advertising from single advertiser's perspec...

Please sign up or login with your details

Forgot password? Click here to reset