Be Aware of Non-Stationarity: Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits

09/12/2019
by   Lingda Wang, et al.
0

Cascading bandit (CB) is a variant of both the multi-armed bandit (MAB) and the cascade model (CM), where a learning agent aims to maximize the total reward by recommending K out of L items to a user. We focus on a common real-world scenario where the user's preference can change in a piecewise-stationary manner. Two efficient algorithms, GLRT-CascadeUCB and GLRT-CascadeKL-UCB, are developed. The key idea behind the proposed algorithms is incorporating an almost parameter-free change-point detector, the Generalized Likelihood Ratio Test (GLRT), within classical upper confidence bound (UCB) based algorithms. Gap-dependent regret upper bounds of the proposed algorithms are derived and both match the lower bound Ω(√(T)) up to a poly-logarithmic factor √(T) in the number of time steps T. We also present numerical experiments on both synthetic and real-world datasets to show that GLRT-CascadeUCB and GLRT-CascadeKL-UCB outperform state-of-the-art algorithms in the literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2019

A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits

We investigate the piecewise-stationary combinatorial semi-bandit proble...
research
06/09/2023

Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit

We study a structured multi-agent multi-armed bandit (MAMAB) problem in ...
research
02/11/2018

Nearly Optimal Adaptive Procedure for Piecewise-Stationary Bandit: a Change-Point Detection Approach

Multi-armed bandit (MAB) is a class of online learning problems where a ...
research
02/10/2023

Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application to Joint Communications and Sensing

We study a multi-objective multi-armed bandit problem in a dynamic envir...
research
02/05/2019

The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm for Piece-Wise Non-Stationary Bandits

We propose a new algorithm for the piece-wise non-stationary bandit pro...
research
07/26/2023

Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related Rewards

We study the piecewise stationary combinatorial semi-bandit problem with...
research
11/08/2022

Adaptive Data Depth via Multi-Armed Bandits

Data depth, introduced by Tukey (1975), is an important tool in data sci...

Please sign up or login with your details

Forgot password? Click here to reset