Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

02/29/2020
by   Xiao Xu, et al.
11

A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to characterize the phenomenon that users' preferences towards different items vary differently over time. In the disjoint payoff model, the reward of playing an arm is determined by an arm-specific preference vector, which is piecewise-stationary with asynchronous and distinct changes across different arms. An efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length T is achieved. The algorithm is further extended to a more general setting with hybrid payoffs where the reward of playing an arm is determined by both an arm-specific preference vector and a joint coefficient vector shared by all arms. Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2018

Learning Contextual Bandits in a Non-stationary Environment

Multi-armed bandit algorithms have become a reference solution for handl...
research
10/23/2021

The Countable-armed Bandit with Vanishing Arms

We consider a bandit problem with countably many arms, partitioned into ...
research
04/14/2021

When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution

Collaborative bandit learning, i.e., bandit algorithms that utilize coll...
research
01/23/2019

Online Learning with Diverse User Preferences

In this paper, we investigate the impact of diverse user preference on l...
research
06/20/2019

Stochastic One-Sided Full-Information Bandit

In this paper, we study the stochastic version of the one-sided full inf...
research
03/01/2023

Efficient Explorative Key-term Selection Strategies for Conversational Contextual Bandits

Conversational contextual bandits elicit user preferences by occasionall...
research
05/19/2021

Incentivized Bandit Learning with Self-Reinforcing User Preferences

In this paper, we investigate a new multi-armed bandit (MAB) online lear...

Please sign up or login with your details

Forgot password? Click here to reset