On Limited-Memory Subsampling Strategies for Bandits

06/21/2021
by   Dorian Baudry, et al.
0

There has been a recent surge of interest in nonparametric bandit algorithms based on subsampling. One drawback however of these approaches is the additional complexity required by random subsampling and the storage of the full history of rewards. Our first contribution is to show that a simple deterministic subsampling rule, proposed in the recent work of Baudry et al. (2020) under the name of ”last-block subsampling”, is asymptotically optimal in one-parameter exponential families. In addition, we prove that these guarantees also hold when limiting the algorithm memory to a polylogarithmic function of the time horizon. These findings open up new perspectives, in particular for non-stationary scenarios in which the arm distributions evolve over time. We propose a variant of the algorithm in which only the most recent observations are used for subsampling, achieving optimal regret guarantees under the assumption of a known number of abrupt changes. Extensive numerical simulations highlight the merits of this approach, particularly when the changes are not only affecting the means of the rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2018

Rotting bandits are no harder than stochastic ones

In bandits, arms' distributions are stationary. This is often violated i...
research
02/26/2021

Adapting to misspecification in contextual bandits with offline regression oracles

Computationally efficient contextual bandits are often based on estimati...
research
06/30/2016

Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

We study a generalization of the multi-armed bandit problem with multipl...
research
09/06/2020

A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits

We consider a non-stationary two-armed bandit framework and propose a ch...
research
02/23/2017

A minimax and asymptotically optimal algorithm for stochastic bandits

We propose the kl-UCB ++ algorithm for regret minimization in stochastic...
research
12/27/2021

Tracking Most Severe Arm Changes in Bandits

In bandits with distribution shifts, one aims to automatically detect an...
research
02/05/2019

The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm for Piece-Wise Non-Stationary Bandits

We propose a new algorithm for the piece-wise non-stationary bandit pro...

Please sign up or login with your details

Forgot password? Click here to reset