Learning Contextual Bandits in a Non-stationary Environment

05/23/2018
by   Qingyun Wu, et al.
0

Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually assume a stationary reward distribution, which hardly holds in practice as users' preferences are dynamic. This inevitably costs a recommender system consistent suboptimal performance. In this paper, we consider the situation where the underlying distribution of reward remains unchanged over (possibly short) epochs and shifts at unknown time instants. In accordance, we propose a contextual bandit algorithm that detects possible changes of environment based on its reward estimation confidence and updates its arm selection strategy respectively. Rigorous upper regret bound analysis of the proposed algorithm demonstrates its learning effectiveness in such a non-trivial environment. Extensive empirical evaluations on both synthetic and real-world datasets for recommendation confirm its practical utility in a changing environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2021

When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution

Collaborative bandit learning, i.e., bandit algorithms that utilize coll...
research
02/29/2020

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

A contextual bandit problem is studied in a highly non-stationary enviro...
research
04/28/2020

A Linear Bandit for Seasonal Environments

Contextual bandit algorithms are extremely popular and widely used in re...
research
01/29/2021

Learning User Preferences in Non-Stationary Environments

Recommendation systems often use online collaborative filtering (CF) alg...
research
07/12/2019

Laplacian-regularized graph bandits: Algorithms and theoretical analysis

We study contextual multi-armed bandit problems in the case of multiple ...
research
06/01/2021

Invariant Policy Learning: A Causal Perspective

In the past decade, contextual bandit and reinforcement learning algorit...
research
02/18/2023

Online Continuous Hyperparameter Optimization for Contextual Bandits

In stochastic contextual bandit problems, an agent sequentially makes ac...

Please sign up or login with your details

Forgot password? Click here to reset