On Slowly-varying Non-stationary Bandits

We consider minimisation of dynamic regret in non-stationary bandits with a slowly varying property. Namely, we assume that arms' rewards are stochastic and independent over time, but that the absolute difference between the expected rewards of any arm at any two consecutive time-steps is at most a drift limit δ > 0. For this setting that has not received enough attention in the past, we give a new algorithm which extends naturally the well-known Successive Elimination algorithm to the non-stationary bandit setting. We establish the first instance-dependent regret upper bound for slowly varying non-stationary bandits. The analysis in turn relies on a novel characterization of the instance as a detectable gap profile that depends on the expected arm reward differences. We also provide the first minimax regret lower bound for this problem, enabling us to show that our algorithm is essentially minimax optimal. Also, this lower bound we obtain matches that of the more general total variation-budgeted bandits problem, establishing that the seemingly easier former problem is at least as hard as the more general latter problem in the minimax sense. We complement our theoretical results with experimental illustrations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2023

Smooth Non-Stationary Bandits

In many applications of online decision making, the environment is non-s...
research
10/22/2021

Break your Bandit Routine with LSD Rewards: a Last Switch Dependent Analysis of Satiation and Seasonality

Motivated by the fact that humans like some level of unpredictability or...
research
02/02/2022

Non-Stationary Dueling Bandits

We study the non-stationary dueling bandits problem with K arms, where t...
research
03/09/2021

Regret Bounds for Generalized Linear Bandits under Parameter Drift

Generalized Linear Bandits (GLBs) are powerful extensions to the Linear ...
research
07/11/2023

Tracking Most Significant Shifts in Nonparametric Contextual Bandits

We study nonparametric contextual bandits where Lipschitz mean reward fu...
research
10/28/2022

Dynamic Bandits with an Auto-Regressive Temporal Structure

Multi-armed bandit (MAB) problems are mainly studied under two extreme s...
research
09/07/2016

Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem

We consider a non-stationary formulation of the stochastic multi-armed b...

Please sign up or login with your details

Forgot password? Click here to reset