Regret Bounds for Generalized Linear Bandits under Parameter Drift

03/09/2021
βˆ™
by   Louis Faury, et al.
βˆ™
0
βˆ™

Generalized Linear Bandits (GLBs) are powerful extensions to the Linear Bandit (LB) setting, broadening the benefits of reward parametrization beyond linearity. In this paper we study GLBs in non-stationary environments, characterized by a general metric of non-stationarity known as the variation-budget or parameter-drift, denoted B_T. While previous attempts have been made to extend LB algorithms to this setting, they overlook a salient feature of GLBs which flaws their results. In this work, we introduce a new algorithm that addresses this difficulty. We prove that under a geometric assumption on the action set, our approach enjoys a π’ͺΜƒ(B_T^1/3T^2/3) regret bound. In the general case, we show that it suffers at most a π’ͺΜƒ(B_T^1/5T^4/5) regret. At the core of our contribution is a generalization of the projection step introduced in Filippi et al. (2010), adapted to the non-stationary nature of the problem. Our analysis sheds light on central mechanisms inherited from the setting by explicitly splitting the treatment of the learning and tracking aspects of the problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 05/29/2022

An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

We propose an algorithm for non-stationary kernel bandits that does not ...
research
βˆ™ 10/25/2021

On Slowly-varying Non-stationary Bandits

We consider minimisation of dynamic regret in non-stationary bandits wit...
research
βˆ™ 03/04/2023

MNL-Bandit in non-stationary environments

In this paper, we study the MNL-Bandit problem in a non-stationary envir...
research
βˆ™ 03/05/2023

Revisiting Weighted Strategy for Non-stationary Parametric Bandits

Non-stationary parametric bandits have attracted much attention recently...
research
βˆ™ 02/14/2023

Non-stationary Contextual Bandits and Universal Learning

We study the fundamental limits of learning in contextual bandits, where...
research
βˆ™ 06/24/2021

Improved Regret Bounds for Tracking Experts with Memory

We address the problem of sequential prediction with expert advice in a ...
research
βˆ™ 11/02/2020

Self-Concordant Analysis of Generalized Linear Bandits with Forgetting

Contextual sequential decision problems with categorical or numerical ob...

Please sign up or login with your details

Forgot password? Click here to reset