Revisiting Weighted Strategy for Non-stationary Parametric Bandits

03/05/2023
by   Jing Wang, et al.
0

Non-stationary parametric bandits have attracted much attention recently. There are three principled ways to deal with non-stationarity, including sliding-window, weighted, and restart strategies. As many non-stationary environments exhibit gradual drifting patterns, the weighted strategy is commonly adopted in real-world applications. However, previous theoretical studies show that its analysis is more involved and the algorithms are either computationally less efficient or statistically suboptimal. This paper revisits the weighted strategy for non-stationary parametric bandits. In linear bandits (LB), we discover that this undesirable feature is due to an inadequate regret analysis, which results in an overly complex algorithm design. We propose a refined analysis framework, which simplifies the derivation and importantly produces a simpler weight-based algorithm that is as efficient as window/restart-based algorithms while retaining the same regret as previous studies. Furthermore, our new framework can be used to improve regret bounds of other parametric bandits, including Generalized Linear Bandits (GLB) and Self-Concordant Bandits (SCB). For example, we develop a simple weighted GLB algorithm with an O(k_μ^5/4 c_μ^-3/4 d^3/4 P_T^1/4T^3/4) regret, improving the O(k_μ^2 c_μ^-1d^9/10 P_T^1/5T^4/5) bound in prior work, where k_μ and c_μ characterize the reward model's nonlinearity, P_T measures the non-stationarity, d and T denote the dimension and time horizon.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2018

Learning to Optimize under Non-Stationarity

We introduce algorithms that achieve state-of-the-art dynamic regret bou...
research
11/02/2020

Self-Concordant Analysis of Generalized Linear Bandits with Forgetting

Contextual sequential decision problems with categorical or numerical ob...
research
02/02/2022

Non-Stationary Dueling Bandits

We study the non-stationary dueling bandits problem with K arms, where t...
research
03/09/2021

Non-stationary Linear Bandits Revisited

In this note, we revisit non-stationary linear bandits, a variant of sto...
research
03/09/2021

Regret Bounds for Generalized Linear Bandits under Parameter Drift

Generalized Linear Bandits (GLBs) are powerful extensions to the Linear ...
research
03/28/2022

Composite Anderson acceleration method with dynamic window-sizes and optimized damping

In this paper, we propose and analyze a set of fully non-stationary Ande...
research
09/06/2020

A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits

We consider a non-stationary two-armed bandit framework and propose a ch...

Please sign up or login with your details

Forgot password? Click here to reset