Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium

04/26/2021
by   Yu-Guan Hsieh, et al.
0

In game-theoretic learning, several agents are simultaneously following their individual interests, so the environment is non-stationary from each player's perspective. In this context, the performance of a learning algorithm is often measured by its regret. However, no-regret algorithms are not created equal in terms of game-theoretic guarantees: depending on how they are tuned, some of them may drive the system to an equilibrium, while others could produce cyclic, chaotic, or otherwise divergent trajectories. To account for this, we propose a range of no-regret policies based on optimistic mirror descent, with the following desirable properties: i) they do not require any prior tuning or knowledge of the game; ii) they all achieve O(√(T)) regret against arbitrary, adversarial opponents; and iii) they converge to the best response against convergent opponents. Also, if employed by all players, then iv) they guarantee O(1) social regret; while v) the induced sequence of play converges to Nash equilibrium with O(1) individual regret in all variationally stable games (a class of games that includes all monotone and convex-concave zero-sum games).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2022

No-Regret Learning in Network Stochastic Zero-Sum Games

No-regret learning has been widely used to compute a Nash equilibrium in...
research
06/13/2022

No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation

We examine the problem of regret minimization when the learner is involv...
research
09/10/2018

Learning in time-varying games

In this paper, we examine the long-term behavior of regret-minimizing ag...
research
02/12/2018

Let's be honest: An optimal no-regret framework for zero-sum games

We revisit the problem of solving two-player zero-sum games in the decen...
research
06/22/2023

Logarithmic Regret for Matrix Games against an Adversary with Noisy Bandit Feedback

This paper considers a variant of zero-sum matrix games where at each ti...
research
06/19/2020

Gradient-free Online Learning in Games with Delayed Rewards

Motivated by applications to online advertising and recommender systems,...
research
07/08/2022

Online Learning in Supply-Chain Games

We study a repeated game between a supplier and a retailer who want to m...

Please sign up or login with your details

Forgot password? Click here to reset