Non-stationary Linear Bandits Revisited

03/09/2021
by   Peng Zhao, et al.
0

In this note, we revisit non-stationary linear bandits, a variant of stochastic linear bandits with a time-varying underlying regression parameter. Existing studies develop various algorithms and show that they enjoy an O(T^2/3(1+P_T)^1/3) dynamic regret, where T is the time horizon and P_T is the path-length that measures the fluctuation of the evolving unknown parameter. However, we discover that a serious technical flaw makes the argument ungrounded. We revisit the analysis and present a fix. Without modifying original algorithms, we can prove an O(T^3/4(1+P_T)^1/4) dynamic regret for these algorithms, slightly worse than the rate as was anticipated. We also show some impossibility results for the key quantity concerned in the regret analysis. Note that the above dynamic regret guarantee requires an oracle knowledge of the path-length P_T. Combining the bandit-over-bandit mechanism, we can also achieve the same guarantee in a parameter-free way.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2023

A Definition of Non-Stationary Bandits

The subject of non-stationary bandit learning has attracted much recent ...
research
10/06/2018

Learning to Optimize under Non-Stationarity

We introduce algorithms that achieve state-of-the-art dynamic regret bou...
research
12/11/2019

Near-optimal Oracle-efficient Algorithms for Stationary and Non-Stationary Stochastic Linear Bandits

We investigate the design of two algorithms that enjoy not only computat...
research
03/05/2023

Revisiting Weighted Strategy for Non-stationary Parametric Bandits

Non-stationary parametric bandits have attracted much attention recently...
research
02/02/2022

Non-Stationary Dueling Bandits

We study the non-stationary dueling bandits problem with K arms, where t...
research
05/04/2022

Second Order Path Variationals in Non-Stationary Online Learning

We consider the problem of universal dynamic regret minimization under e...
research
01/29/2019

Improved Path-length Regret Bounds for Bandits

We study adaptive regret bounds in terms of the variation of the losses ...

Please sign up or login with your details

Forgot password? Click here to reset