Discounted Thompson Sampling for Non-Stationary Bandit Problems

05/18/2023
by   Han Qi, et al.
0

Non-stationary multi-armed bandit (NS-MAB) problems have recently received significant attention. NS-MAB are typically modelled in two scenarios: abruptly changing, where reward distributions remain constant for a certain period and change at unknown time steps, and smoothly changing, where reward distributions evolve smoothly based on unknown dynamics. In this paper, we propose Discounted Thompson Sampling (DS-TS) with Gaussian priors to address both non-stationary settings. Our algorithm passively adapts to changes by incorporating a discounted factor into Thompson Sampling. DS-TS method has been experimentally validated, but analysis of the regret upper bound is currently lacking. Under mild assumptions, we show that DS-TS with Gaussian priors can achieve nearly optimal regret bound on the order of Õ(√(TB_T)) for abruptly changing and Õ(T^β) for smoothly changing, where T is the number of time steps, B_T is the number of breakpoints, β is associated with the smoothly changing environment and Õ hides the parameters independent of T as well as logarithmic terms. Furthermore, empirical comparisons between DS-TS and other non-stationary bandit algorithms demonstrate its competitive performance. Specifically, when prior knowledge of the maximum expected reward is available, DS-TS has the potential to outperform state-of-the-art algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2020

Multiscale Non-stationary Stochastic Bandits

Classic contextual bandit algorithms for linear models, such as LinUCB, ...
research
02/11/2018

Nearly Optimal Adaptive Procedure for Piecewise-Stationary Bandit: a Change-Point Detection Approach

Multi-armed bandit (MAB) is a class of online learning problems where a ...
research
10/18/2019

Autonomous exploration for navigating in non-stationary CMPs

We consider a setting in which the objective is to learn to navigate in ...
research
04/28/2020

A Linear Bandit for Seasonal Environments

Contextual bandit algorithms are extremely popular and widely used in re...
research
08/20/2019

How to gamble with non-stationary X-armed bandits and have no regrets

In X-armed bandit problem an agent sequentially interacts with environme...
research
01/02/2023

Local Differential Privacy for Sequential Decision Making in a Changing Environment

We study the problem of preserving privacy while still providing high ut...
research
02/23/2018

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

We study the non-stationary stochastic multiarmed bandit (MAB) problem a...

Please sign up or login with your details

Forgot password? Click here to reset