Efficient Contextual Bandits in Non-stationary Worlds

08/05/2017
by   Haipeng Luo, et al.
0

Most contextual bandit algorithms minimize regret to the best fixed policy--a questionable benchmark for non-stationary environments ubiquitous in applications. In this work, we obtain efficient contextual bandit algorithms with strong guarantees for alternate notions of regret suited to these non-stationary environments. Two of our algorithms equip existing methods for i.i.d problems with sophisticated statistical tests, dynamically adapting to a change in distribution. The third approach uses a recent technique for combining multiple bandit algorithms, with each copy starting at a different round so as to learn over different data segments. We analyze several notions of regret for these methods, including the first results on dynamic regret for efficient contextual bandit algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2020

Multiscale Non-stationary Stochastic Bandits

Classic contextual bandit algorithms for linear models, such as LinUCB, ...
research
07/09/2020

Recurrent Neural-Linear Posterior Sampling for Non-Stationary Contextual Bandits

An agent in a non-stationary contextual bandit problem should balance be...
research
01/02/2019

Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

We investigate the feasibility of learning from both fully-labeled super...
research
02/03/2019

A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free

We propose the first contextual bandit algorithm that is parameter-free,...
research
02/18/2023

Online Continuous Hyperparameter Optimization for Contextual Bandits

In stochastic contextual bandit problems, an agent sequentially makes ac...
research
06/28/2022

Dynamic Memory for Interpretable Sequential Optimisation

Real-world applications of reinforcement learning for recommendation and...
research
02/14/2023

Non-stationary Contextual Bandits and Universal Learning

We study the fundamental limits of learning in contextual bandits, where...

Please sign up or login with your details

Forgot password? Click here to reset