Near-Optimal Last-iterate Convergence of Policy Optimization in Zero-sum Polymatrix Markov games

08/15/2023
by   Zailin Ma, et al.
0

Computing approximate Nash equilibria in multi-player general-sum Markov games is a computationally intractable task. However, multi-player Markov games with certain cooperative or competitive structures might circumvent this intractability. In this paper, we focus on multi-player zero-sum polymatrix Markov games, where players interact in a pairwise fashion while remain overall competitive. To the best of our knowledge, we propose the first policy optimization algorithm called Entropy-Regularized Optimistic-Multiplicative-Weights-Update (ER-OMWU) for finding approximate Nash equilibria in finite-horizon zero-sum polymatrix Markov games with full information feedback. We provide last-iterate convergence guarantees for finding an ϵ-approximate Nash equilibrium within Õ(1/ϵ) iterations, which is near-optimal compared to the optimal O(1/ϵ) iteration complexity in two-player zero-sum Markov games, which is a degenerate case of zero-sum polymatrix games with only two players involved. Our algorithm combines the regularized and optimistic learning dynamics with separated smooth value update within a single loop, where players update strategies in a symmetric and almost uncoupled manner. It provides a natural dynamics for finding equilibria and is more probable to be adapted to a sample-efficient and fully decentralized implementation where only partial information feedback is available in the future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2022

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

Multi-Agent Reinforcement Learning (MARL) – where multiple agents learn ...
research
09/26/2022

O(T^-1) Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games

We prove that optimistic-follow-the-regularized-leader (OFTRL), together...
research
06/06/2022

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

This paper studies policy optimization algorithms for multi-agent reinfo...
research
07/13/2023

Multi-Player Zero-Sum Markov Games with Networked Separable Interactions

We study a new class of Markov games (MGs), Multi-player Zero-sum Markov...
research
09/09/2019

A fixed-point policy-iteration-type algorithm for symmetric nonzero-sum stochastic impulse games

Nonzero-sum stochastic differential games with impulse controls offer a ...
research
09/14/2020

Optimal market making under partial information and numerical methods for impulse control games with applications

The topics treated in this thesis are inherently two-fold. The first par...
research
02/08/2021

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

We study infinite-horizon discounted two-player zero-sum Markov games, a...

Please sign up or login with your details

Forgot password? Click here to reset