Policy Regret in Repeated Games

11/09/2018
by   Raman Arora, et al.
0

The notion of policy regret in online learning is a well defined? performance measure for the common scenario of adaptive adversaries, which more traditional quantities such as external regret do not take into account. We revisit the notion of policy regret and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of play that achieves a favorable regret with respect to one definition must do poorly with respect to the other. We then focus on the game-theoretic setting where the adversary is a self-interested agent. In that setting, we show that external regret and policy regret are not in conflict and, in fact, that a wide class of algorithms can ensure a favorable regret with respect to both definitions, so long as the adversary is also using such an algorithm. We also show that the sequence of play of no-policy regret algorithms converges to a policy equilibrium, a new notion of equilibrium that we introduce. Relating this back to external regret, we show that coarse correlated equilibria, which no-external regret players converge to, are a strict subset of policy equilibria. Thus, in game-theoretic settings, every sequence of play with no external regret also admits no policy regret, but the converse does not hold.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2012

Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret

Online learning algorithms are designed to learn even when their input i...
research
03/08/2021

No Discounted-Regret Learning in Adversarial Bandits with Delays

Consider a player that in each round t out of T rounds chooses an action...
research
02/13/2023

Achieving Better Regret against Strategic Adversaries

We study online learning problems in which the learner has extra knowled...
research
05/31/2023

Is Learning in Games Good for the Learners?

We consider a number of questions related to tradeoffs between reward an...
research
08/09/2021

Online Multiobjective Minimax Optimization and Applications

We introduce a simple but general online learning framework, in which at...
research
11/13/2018

A Local Regret in Nonconvex Online Learning

We consider an online learning process to forecast a sequence of outcome...
research
06/19/2020

Gradient-free Online Learning in Games with Delayed Rewards

Motivated by applications to online advertising and recommender systems,...

Please sign up or login with your details

Forgot password? Click here to reset