Smooth Non-Stationary Bandits

01/29/2023
by   Su Jia, et al.
0

In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee T^2/3 regret. However, in practice environments are often changing smoothly, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the rate of change. In this paper, we study a non-stationary two-arm bandit problem where we assume an arm's mean reward is a β-Hölder function over (normalized) time, meaning it is (β-1)-times Lipschitz-continuously differentiable. We show the first separation between the smooth and non-smooth regimes by presenting a policy with T^3/5 regret for β=2. We complement this result by a T^β+1/2β+1 lower bound for any integer β≥ 1, which matches our upper bound for β=2.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2021

On Slowly-varying Non-stationary Bandits

We consider minimisation of dynamic regret in non-stationary bandits wit...
research
12/11/2020

Smooth Bandit Optimization: Generalization to Hölder Space

We consider bandit optimization of a smooth reward function, where the g...
research
05/25/2022

Non-stationary Bandits with Knapsacks

In this paper, we study the problem of bandits with knapsacks (BwK) in a...
research
10/22/2021

Break your Bandit Routine with LSD Rewards: a Last Switch Dependent Analysis of Satiation and Seasonality

Motivated by the fact that humans like some level of unpredictability or...
research
07/11/2023

Tracking Most Significant Shifts in Nonparametric Contextual Bandits

We study nonparametric contextual bandits where Lipschitz mean reward fu...
research
02/01/2021

Generalized non-stationary bandits

In this paper, we study a non-stationary stochastic bandit problem, whic...
research
04/28/2020

A Linear Bandit for Seasonal Environments

Contextual bandit algorithms are extremely popular and widely used in re...

Please sign up or login with your details

Forgot password? Click here to reset