Rotting bandits are no harder than stochastic ones

11/27/2018
by   Julien Seznec, et al.
0

In bandits, arms' distributions are stationary. This is often violated in practice, where rewards change over time. In applications as recommendation systems, online advertising, and crowdsourcing, the changes may be triggered by the pulls, so that the arms' rewards change as a function of the number of pulls. In this paper, we consider the specific case of non-parametric rotting bandits, where the expected reward of an arm may decrease every time it is pulled. We introduce the filtering on expanding window average (FEWA) algorithm that at each round constructs moving averages of increasing windows to identify arms that are more likely to return high rewards when pulled once more. We prove that, without any knowledge on the decreasing behavior of the arms, FEWA achieves similar anytime problem-dependent, O((KT)), and problem-independent, O(√(KT)), regret bounds of near-optimal stochastic algorithms as UCB1 of Auer et al. (2002a). This result substantially improves the prior result of Levine et al. (2017) which needed knowledge of the horizon and decaying parameters to achieve problem-independent bound of only O(K^1/3T^2/3). Finally, we report simulations confirming the theoretical improvements of FEWA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2017

Rotting Bandits

The Multi-Armed Bandits (MAB) framework highlights the tension between a...
research
06/21/2021

On Limited-Memory Subsampling Strategies for Bandits

There has been a recent surge of interest in nonparametric bandit algori...
research
11/04/2022

Online Learning and Bandits with Queried Hints

We consider the classic online learning and stochastic multi-armed bandi...
research
05/20/2014

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandits where the expected reward is ...
research
10/28/2022

Dynamic Bandits with an Auto-Regressive Temporal Structure

Multi-armed bandit (MAB) problems are mainly studied under two extreme s...
research
04/15/2017

Asynchronous Parallel Empirical Variance Guided Algorithms for the Thresholding Bandit Problem

This paper considers the multi-armed thresholding bandit problem -- iden...
research
04/02/2020

Predictive Bandits

We introduce and study a new class of stochastic bandit problems, referr...

Please sign up or login with your details

Forgot password? Click here to reset