Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret

06/27/2012
by   Raman Arora, et al.
0

Online learning algorithms are designed to learn even when their input is generated by an adversary. The widely-accepted formal definition of an online algorithm's ability to learn is the game-theoretic notion of regret. We argue that the standard definition of regret becomes inadequate if the adversary is allowed to adapt to the online algorithm's actions. We define the alternative notion of policy regret, which attempts to provide a more meaningful way to measure an online algorithm's performance against adaptive adversaries. Focusing on the online bandit setting, we show that no bandit algorithm can guarantee a sublinear policy regret against an adaptive adversary with unbounded memory. On the other hand, if the adversary's memory is bounded, we present a general technique that converts any bandit algorithm with a sublinear regret bound into an algorithm with a sublinear policy regret bound. We extend this result to other variants of regret, such as switching regret, internal regret, and swap regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2022

Complete Policy Regret Bounds for Tallying Bandits

Policy regret is a well established notion of measuring the performance ...
research
11/09/2018

Policy Regret in Repeated Games

The notion of policy regret in online learning is a well defined? perfor...
research
10/28/2020

Provably Efficient Online Agnostic Learning in Markov Games

We study online agnostic learning, a problem that arises in episodic mul...
research
07/26/2023

Online learning in bandits with predicted context

We consider the contextual bandit problem where at each time, the agent ...
research
11/13/2018

A Local Regret in Nonconvex Online Learning

We consider an online learning process to forecast a sequence of outcome...
research
02/18/2013

Online Learning with Switching Costs and Other Adaptive Adversaries

We study the power of different types of adaptive (nonoblivious) adversa...
research
07/16/2022

Online Prediction in Sub-linear Space

We provide the first sub-linear space and sub-linear regret algorithm fo...

Please sign up or login with your details

Forgot password? Click here to reset