Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

02/08/2021
by   Chen-Yu Wei, et al.
0

We study infinite-horizon discounted two-player zero-sum Markov games, and develop a decentralized algorithm that provably converges to the set of Nash equilibria under self-play. Our algorithm is based on running an Optimistic Gradient Descent Ascent algorithm on each state to learn the policies, with a critic that slowly learns the value of each state. To the best of our knowledge, this is the first algorithm in this setting that is simultaneously rational (converging to the opponent's best response when it uses a stationary policy), convergent (converging to the set of Nash equilibria under self-play), agnostic (no need to know the actions played by the opponent), symmetric (players taking symmetric roles in the algorithm), and enjoying a finite-time last-iterate convergence guarantee, all of which are desirable properties of decentralized algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2021

O(1/T) Time-Average Convergence in a Generalization of Multiagent Zero-Sum Games

We introduce a generalization of zero-sum network multiagent matrix game...
research
09/17/2021

Solving infinite-horizon Dec-POMDPs using Finite State Controllers within JESP

This paper looks at solving collaborative planning problems formalized a...
research
08/15/2023

Near-Optimal Last-iterate Convergence of Policy Optimization in Zero-sum Polymatrix Markov games

Computing approximate Nash equilibria in multi-player general-sum Markov...
research
07/07/2022

Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions

Recent extensions to dynamic games of the well-known fictitious play lea...
research
03/03/2023

Can We Find Nash Equilibria at a Linear Rate in Markov Games?

We study decentralized learning in two-player zero-sum discounted Markov...
research
01/08/2014

Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games

We consider the problem of finding stationary Nash equilibria (NE) in a ...
research
07/01/2003

AWESOME: A General Multiagent Learning Algorithm that Converges in Self-Play and Learns a Best Response Against Stationary Opponents

A satisfactory multiagent learning algorithm should, at a minimum, lear...

Please sign up or login with your details

Forgot password? Click here to reset