No-Regret Learning in Time-Varying Zero-Sum Games

01/30/2022
by   Mengxiao Zhang, et al.
8

Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning. We consider a variant of this problem where the game payoff matrix changes over time, possibly in an adversarial manner. We first present three performance measures to guide the algorithmic design for this problem: 1) the well-studied individual regret, 2) an extension of duality gap, and 3) a new measure called dynamic Nash Equilibrium regret, which quantifies the cumulative difference between the player's payoff and the minimax game value. Next, we develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under all these three performance measures. These guarantees are adaptive to different non-stationarity measures of the payoff matrices and, importantly, recover the best known results when the payoff matrix is fixed. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property, along with several novel ingredients specifically designed for the time-varying game setting. Empirical results further validate the effectiveness of our algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2019

Competing Against Equilibria in Zero-Sum Games with Evolving Payoffs

We study the problem of repeated play in a zero-sum game in which the pa...
research
07/28/2021

Efficient Episodic Learning of Nonstationary and Unknown Zero-Sum Games Using Expert Game Ensembles

Game theory provides essential analysis in many applications of strategi...
research
01/26/2023

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Most of the literature on learning in games has focused on the restricti...
research
09/10/2018

Learning in time-varying games

In this paper, we examine the long-term behavior of regret-minimizing ag...
research
06/13/2018

DRE-Bot: A Hierarchical First Person Shooter Bot Using Multiple Sarsa(λ) Reinforcement Learners

This paper describes an architecture for controlling non-player characte...
research
07/22/2020

Exploiting No-Regret Algorithms in System Design

We investigate a repeated two-player zero-sum game setting where the col...
research
02/14/2022

Temporal Properties of Vaccine Effectiveness Measures in Presence of Multiple Pathogen Variants and Multiple Vaccines

Vaccine effectiveness (VE) is typically defined as incidence rate ratio,...

Please sign up or login with your details

Forgot password? Click here to reset