Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes

05/11/2019
by   James P. Bailey, et al.
0

We show for the first time, to our knowledge, that it is possible to reconcile in online learning in zero-sum games two seemingly contradictory objectives: vanishing time-average regret and non-vanishing step sizes. This phenomenon, that we coin "fast and furious" learning in games, sets a new benchmark about what is possible both in max-min optimization as well as in multi-agent systems. Our analysis does not depend on introducing a carefully tailored dynamic. Instead we focus on the most well studied online dynamic, gradient descent. Similarly, we focus on the simplest textbook class of games, two-agent two-strategy zero-sum games, such as Matching Pennies. Even for this simplest of benchmarks the best known bound for total regret, prior to our work, was the trivial one of O(T), which is immediately applicable even to a non-learning agent. Based on a tight understanding of the geometry of the non-equilibrating trajectories in the dual space we prove a regret bound of Θ(√(T)) matching the well known optimal bound for adaptive step sizes in the online setting. This guarantee holds for all fixed step-sizes without having to know the time horizon in advance and adapt the fixed step-size accordingly. As a corollary, we establish that even with fixed learning rates the time-average of mixed strategies, utilities converge to their exact Nash equilibrium values.

READ FULL TEXT
research
05/21/2019

Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games

We establish that algorithmic experiments in zero-sum games "fail misera...
research
11/29/2021

Optimal No-Regret Learning in General Games: Bounded Regret with Unbounded Step-Sizes via Clairvoyant MWU

In this paper we solve the problem of no-regret learning in general game...
research
05/29/2022

No-Regret Learning in Network Stochastic Zero-Sum Games

No-regret learning has been widely used to compute a Nash equilibrium in...
research
09/08/2021

Learning Zero-sum Stochastic Games with Posterior Sampling

In this paper, we propose Posterior Sampling Reinforcement Learning for ...
research
01/23/2021

Optimistic and Adaptive Lagrangian Hedging

In online learning an algorithm plays against an environment with losses...
research
02/12/2018

Let's be honest: An optimal no-regret framework for zero-sum games

We revisit the problem of solving two-player zero-sum games in the decen...
research
03/05/2019

Multi-Agent Learning in Network Zero-Sum Games is a Hamiltonian System

Zero-sum games are natural, if informal, analogues of closed physical sy...

Please sign up or login with your details

Forgot password? Click here to reset