A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free

02/03/2019
by   Yifang Chen, et al.
0

We propose the first contextual bandit algorithm that is parameter-free, efficient, and optimal in terms of dynamic regret. Specifically, our algorithm achieves dynamic regret O({√(ST), Δ^1/3T^2/3}) for a contextual bandit problem with T rounds, S switches and Δ total variation in data distributions. Importantly, our algorithm is adaptive and does not need to know S or Δ ahead of time, and can be implemented efficiently assuming access to an ERM oracle. Our results strictly improve the O({S^1/4T^3/4, Δ^1/5T^4/5}) bound of (Luo et al., 2018), and greatly generalize and improve the O(√(ST)) result of (Auer et al, 2018) that holds only for the two-armed bandit problem without contextual information. The key novelty of our algorithm is to introduce replay phases, in which the algorithm acts according to its previous decisions for a certain amount of time in order to detect non-stationarity while maintaining a good balance between exploration and exploitation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2021

Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems

We consider the problem of controlling a Linear Quadratic Regulator (LQR...
research
08/05/2017

Efficient Contextual Bandits in Non-stationary Worlds

Most contextual bandit algorithms minimize regret to the best fixed poli...
research
07/16/2020

A Smoothed Analysis of Online Lasso for the Sparse Linear Contextual Bandit Problem

We investigate the sparse linear contextual bandit problem where the par...
research
01/25/2019

Almost Boltzmann Exploration

Boltzmann exploration is widely used in reinforcement learning to provid...
research
06/19/2023

High-dimensional Contextual Bandit Problem without Sparsity

In this research, we investigate the high-dimensional linear contextual ...
research
07/09/2020

Recurrent Neural-Linear Posterior Sampling for Non-Stationary Contextual Bandits

An agent in a non-stationary contextual bandit problem should balance be...
research
02/20/2019

A Note on Bounding Regret of the C^2UCB Contextual Combinatorial Bandit

We revisit the proof by Qin et al. (2014) of bounded regret of the C^2UC...

Please sign up or login with your details

Forgot password? Click here to reset