DeepAI AI Chat
Log In Sign Up

O(T^-1) Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games

by   Yuepeng Yang, et al.

We prove that optimistic-follow-the-regularized-leader (OFTRL), together with smooth value updates, finds an O(T^-1)-approximate Nash equilibrium in T iterations for two-player zero-sum Markov games with full information. This improves the Õ(T^-5/6) convergence rate recently shown in the paper Zhang et al (2022). The refined analysis hinges on two essential ingredients. First, the sum of the regrets of the two players, though not necessarily non-negative as in normal-form games, is approximately non-negative in Markov games. This property allows us to bound the second-order path lengths of the learning dynamics. Second, we prove a tighter algebraic inequality regarding the weights deployed by OFTRL that shaves an extra log T factor. This crucial improvement enables the inductive analysis that leads to the final O(T^-1) rate.


page 1

page 2

page 3

page 4


Near-Optimal Last-iterate Convergence of Policy Optimization in Zero-sum Polymatrix Markov games

Computing approximate Nash equilibria in multi-player general-sum Markov...

On Last-Iterate Convergence Beyond Zero-Sum Games

Most existing results about last-iterate convergence of learning dynamic...

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games

We revisit the problem of learning in two-player zero-sum Markov games, ...

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

This paper studies policy optimization algorithms for multi-agent reinfo...

Symbolic Approximation of Weighted Timed Games

Weighted timed games are zero-sum games played by two players on a timed...

Optimal controller synthesis for timed systems

Weighted timed games are zero-sum games played by two players on a timed...

Mutation-Driven Follow the Regularized Leader for Last-Iterate Convergence in Zero-Sum Games

In this study, we consider a variant of the Follow the Regularized Leade...