Non-Stationary Latent Bandits

12/01/2020
by   Joey Hong, et al.
0

Users of recommender systems often behave in a non-stationary fashion, due to their evolving preferences and tastes over time. In this work, we propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models. We call this problem a non-stationary latent bandit. We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset. The main strength of our approach is that it can be combined with rich offline-learned models, which can be misspecified, and are subsequently fine-tuned online using posterior sampling. In this way, we naturally combine the strengths of offline and online learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2019

Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model

Non-stationarity appears in many online applications such as web search ...
research
01/29/2021

Learning User Preferences in Non-Stationary Environments

Recommendation systems often use online collaborative filtering (CF) alg...
research
06/15/2023

ReLoop2: Building Self-Adaptive Recommendation Models via Responsive Error Compensation Loop

Industrial recommender systems face the challenge of operating in non-st...
research
06/15/2020

Latent Bandits Revisited

A latent bandit problem is one in which the learning agent knows the arm...
research
02/14/2018

Online Learning for Non-Stationary A/B Tests

The rollout of new versions of a feature in modern applications is a man...
research
10/12/2021

Optimizing Ranking Systems Online as Bandits

Ranking system is the core part of modern retrieval and recommender syst...
research
06/03/2020

Non-Stationary Bandits with Intermediate Observations

Online recommender systems often face long delays in receiving feedback,...

Please sign up or login with your details

Forgot password? Click here to reset