Tracking Most Significant Shifts in Nonparametric Contextual Bandits

07/11/2023
by   Joe Suk, et al.
0

We study nonparametric contextual bandits where Lipschitz mean reward functions may change over time. We first establish the minimax dynamic regret rate in this less understood setting in terms of number of changes L and total-variation V, both capturing all changes in distribution over context space, and argue that state-of-the-art procedures are suboptimal in this setting. Next, we tend to the question of an adaptivity for this setting, i.e. achieving the minimax rate without knowledge of L or V. Quite importantly, we posit that the bandit problem, viewed locally at a given context X_t, should not be affected by reward changes in other parts of context space X. We therefore propose a notion of change, which we term experienced significant shifts, that better accounts for locality, and thus counts considerably less changes than L and V. Furthermore, similar to recent work on non-stationary MAB (Suk Kpotufe, 2022), experienced significant shifts only count the most significant changes in mean rewards, e.g., severe best-arm changes relevant to observed contexts. Our main result is to show that this more tolerant notion of change can in fact be adapted to.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/27/2021

Tracking Most Severe Arm Changes in Bandits

In bandits with distribution shifts, one aims to automatically detect an...
research
01/29/2023

Smooth Non-Stationary Bandits

In many applications of online decision making, the environment is non-s...
research
10/25/2021

On Slowly-varying Non-stationary Bandits

We consider minimisation of dynamic regret in non-stationary bandits wit...
research
07/16/2020

Self-Tuning Bandits over Unknown Covariate-Shifts

Bandits with covariates, a.k.a. contextual bandits, address situations w...
research
10/19/2021

Regret Minimization in Isotonic, Heavy-Tailed Contextual Bandits via Adaptive Confidence Bands

In this paper we initiate a study of non parametric contextual bandits u...
research
09/06/2020

A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits

We consider a non-stationary two-armed bandit framework and propose a ch...

Please sign up or login with your details

Forgot password? Click here to reset