O(T^-1) Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games

09/26/2022
by   Yuepeng Yang, et al.
0

We prove that optimistic-follow-the-regularized-leader (OFTRL), together with smooth value updates, finds an O(T^-1)-approximate Nash equilibrium in T iterations for two-player zero-sum Markov games with full information. This improves the Õ(T^-5/6) convergence rate recently shown in the paper Zhang et al (2022). The refined analysis hinges on two essential ingredients. First, the sum of the regrets of the two players, though not necessarily non-negative as in normal-form games, is approximately non-negative in Markov games. This property allows us to bound the second-order path lengths of the learning dynamics. Second, we prove a tighter algebraic inequality regarding the weights deployed by OFTRL that shaves an extra log T factor. This crucial improvement enables the inductive analysis that leads to the final O(T^-1) rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2023

Near-Optimal Last-iterate Convergence of Policy Optimization in Zero-sum Polymatrix Markov games

Computing approximate Nash equilibria in multi-player general-sum Markov...
research
03/22/2022

On Last-Iterate Convergence Beyond Zero-Sum Games

Most existing results about last-iterate convergence of learning dynamic...
research
03/05/2023

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games

We revisit the problem of learning in two-player zero-sum Markov games, ...
research
06/06/2022

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

This paper studies policy optimization algorithms for multi-agent reinfo...
research
12/03/2018

Symbolic Approximation of Weighted Timed Games

Weighted timed games are zero-sum games played by two players on a timed...
research
04/26/2021

Optimal controller synthesis for timed systems

Weighted timed games are zero-sum games played by two players on a timed...
research
06/18/2022

Mutation-Driven Follow the Regularized Leader for Last-Iterate Convergence in Zero-Sum Games

In this study, we consider a variant of the Follow the Regularized Leade...

Please sign up or login with your details

Forgot password? Click here to reset