Weighted Linear Bandits for Non-Stationary Environments

09/19/2019
by   Yoan Russac, et al.
0

We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments. We provide theoretical guarantees on the behavior of D-LinUCB in both slowly-varying and abruptly-changing environments. We obtain an upper bound on the dynamic regret that is of order d^2/3 B_T^1/3T^2/3, where B_T is a measure of non-stationarity (d and T being, respectively, dimension and horizon). This rate is known to be optimal. We also illustrate the empirical performance of D-LinUCB and compare it with recently proposed alternatives in simulated environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2020

Multiscale Non-stationary Stochastic Bandits

Classic contextual bandit algorithms for linear models, such as LinUCB, ...
research
03/23/2020

Algorithms for Non-Stationary Generalized Linear Bandits

The statistical framework of Generalized Linear Models (GLM) can be appl...
research
07/07/2023

BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits

We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Boun...
research
11/02/2020

Self-Concordant Analysis of Generalized Linear Bandits with Forgetting

Contextual sequential decision problems with categorical or numerical ob...
research
10/11/2019

Robust Hierarchical-Optimization RLS Against Sparse Outliers

This paper fortifies the recently introduced hierarchical-optimization r...
research
09/19/2022

A Multi-Layer Regression based Predicable Function Fitting Network

Function plays an important role in mathematics and many science branche...
research
07/06/2021

Weighted Gaussian Process Bandits for Non-stationary Environments

In this paper, we consider the Gaussian process (GP) bandit optimization...

Please sign up or login with your details

Forgot password? Click here to reset