Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem

09/07/2016
by   Robin Allesiardo, et al.
0

We consider a non-stationary formulation of the stochastic multi-armed bandit where the rewards are no longer assumed to be identically distributed. For the best-arm identification task, we introduce a version of Successive Elimination based on random shuffling of the K arms. We prove that under a novel and mild assumption on the mean gap Δ, this simple but powerful modification achieves the same guarantees in term of sample complexity and cumulative regret than its original version, but in a much wider class of problems, as it is not anymore constrained to stationary distributions. We also show that the original Successive Elimination fails to have controlled regret in this more general scenario, thus showing the benefit of shuffling. We then remove our mild assumption and adapt the algorithm to the best-arm identification task with switching arms. We adapt the definition of the sample complexity for that case and prove that, against an optimal policy with N-1 switches of the optimal arm, this new algorithm achieves an expected sample complexity of O(Δ^-2√(NKδ^-1(K δ^-1))), where δ is the probability of failure of the algorithm, and an expected cumulative regret of O(Δ^-1√(NTK (TK))) after T time steps.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2020

Best-Arm Identification for Quantile Bandits with Privacy

We study the best-arm identification problem in multi-armed bandits with...
research
09/13/2022

Sample Complexity of an Adversarial Attack on UCB-based Best-arm Identification Policy

In this work I study the problem of adversarial perturbations to rewards...
research
01/17/2022

A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

We study the non-stationary stochastic multi-armed bandit problem, where...
research
01/31/2015

Sparse Dueling Bandits

The dueling bandit problem is a variation of the classical multi-armed b...
research
10/25/2021

On Slowly-varying Non-stationary Bandits

We consider minimisation of dynamic regret in non-stationary bandits wit...
research
04/30/2023

ICQ: A Quantization Scheme for Best-Arm Identification Over Bit-Constrained Channels

We study the problem of best-arm identification in a distributed variant...
research
02/01/2021

Generalized non-stationary bandits

In this paper, we study a non-stationary stochastic bandit problem, whic...

Please sign up or login with your details

Forgot password? Click here to reset