Tracking Most Severe Arm Changes in Bandits

12/27/2021
by   Joe Suk, et al.
0

In bandits with distribution shifts, one aims to automatically detect an unknown number L of changes in reward distribution, and restart exploration when necessary. While this problem remained open for many years, a recent breakthrough of Auer et al. (2018, 2019) provide the first adaptive procedure to guarantee an optimal (dynamic) regret √(LT), for T rounds, with no knowledge of L. However, not all distributional shifts are equally severe, e.g., suppose no best arm switches occur, then we cannot rule out that a regret O(√(T)) may remain possible; in other words, is it possible to achieve dynamic regret that optimally scales only with an unknown number of severe shifts? This unfortunately has remained elusive, despite various attempts (Auer et al., 2019, Foster et al., 2020). We resolve this problem in the case of two-armed bandits: we derive an adaptive procedure that guarantees a dynamic regret of order Õ(√(L̃ T)), where L̃≪ L captures an unknown number of severe best arm changes, i.e., with significant switches in rewards, and which last sufficiently long to actually require a restart. As a consequence, for any number L of distributional shifts outside of these severe shifts, our procedure achieves regret just Õ(√(T))≪Õ(√(LT)). Finally, we note that our notion of severe shift applies in both classical settings of stochastic switching bandits and of adversarial bandits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2019

Best Arm Identification in Generalized Linear Bandits

Motivated by drug design, we consider the best-arm identification proble...
research
07/11/2023

Tracking Most Significant Shifts in Nonparametric Contextual Bandits

We study nonparametric contextual bandits where Lipschitz mean reward fu...
research
07/16/2020

Self-Tuning Bandits over Unknown Covariate-Shifts

Bandits with covariates, a.k.a. contextual bandits, address situations w...
research
02/02/2020

A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

We study small-loss bounds for the adversarial multi-armed bandits probl...
research
10/14/2019

An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays

We propose a new algorithm for adversarial multi-armed bandits with unre...
research
10/16/2021

On the Pareto Frontier of Regret Minimization and Best Arm Identification in Stochastic Bandits

We study the Pareto frontier of two archetypal objectives in stochastic ...
research
06/21/2021

On Limited-Memory Subsampling Strategies for Bandits

There has been a recent surge of interest in nonparametric bandit algori...

Please sign up or login with your details

Forgot password? Click here to reset