A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits

09/06/2020
by   Gourab Ghatak, et al.
0

We consider a non-stationary two-armed bandit framework and propose a change-detection based Thompson sampling (TS) algorithm, named TS with change-detection (TS-CD), to keep track of the dynamic environment. The non-stationarity is modeled using a Poisson arrival process, which changes the mean of the rewards on each arrival. The proposed strategy compares the empirical mean of the recent rewards of an arm with the estimate of the mean of the rewards from its history. It detects a change when the empirical mean deviates from the mean estimate by a value larger than a threshold. Then, we characterize the lower bound on the duration of the time-window for which the bandit framework must remain stationary for TS-CD to successfully detect a change when it occurs. Consequently, our results highlight an upper bound on the parameter for the Poisson arrival process, for which the TS-CD achieves asymptotic regret optimality with high probability. Finally, we validate the efficacy of TS-CD by testing it for edge-control of radio access technique (RAT)-selection in a wireless network. Our results show that TS-CD not only outperforms the classical max-power RAT selection strategy but also other actively adaptive and passively adaptive bandit algorithms that are designed for non-stationary environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2021

Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits

We consider the non-stationary multi-armed bandit (MAB) framework and pr...
research
05/20/2022

Actively Tracking the Optimal Arm in Non-Stationary Environments with Mandatory Probing

We study a novel multi-armed bandit (MAB) setting which mandates the age...
research
11/08/2017

A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem

The multi-armed bandit problem has been extensively studied under the st...
research
02/05/2019

The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm for Piece-Wise Non-Stationary Bandits

We propose a new algorithm for the piece-wise non-stationary bandit pro...
research
06/21/2021

On Limited-Memory Subsampling Strategies for Bandits

There has been a recent surge of interest in nonparametric bandit algori...
research
03/05/2023

Revisiting Weighted Strategy for Non-stationary Parametric Bandits

Non-stationary parametric bandits have attracted much attention recently...
research
07/11/2023

Tracking Most Significant Shifts in Nonparametric Contextual Bandits

We study nonparametric contextual bandits where Lipschitz mean reward fu...

Please sign up or login with your details

Forgot password? Click here to reset