Thompson Sampling for a Fatigue-aware Online Recommendation System

01/23/2019
by   Yunjuan Wang, et al.
0

In this paper we consider an online recommendation setting, where a platform recommends a sequence of items to its users at every time period. The users respond by selecting one of the items recommended or abandon the platform due to fatigue from seeing less useful items. Assuming a parametric stochastic model of user behavior, which captures positional effects of these items as well as the abandoning behavior of users, the platform's goal is to recommend sequences of items that are competitive to the single best sequence of items in hindsight, without knowing the true user model a priori. Naively applying a stochastic bandit algorithm in this setting leads to an exponential dependence on the number of items. We propose a new Thompson sampling based algorithm with expected regret that is polynomial in the number of items in this combinatorial setting, and performs extremely well in practice. We also show a contextual version of our solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2020

Contextual User Browsing Bandits for Large-Scale Online Mobile Recommendation

Online recommendation services recommend multiple commodities to users. ...
research
08/25/2021

Recommendation System Simulations: A Discussion of Two Key Challenges

As recommendation systems become increasingly standard for online platfo...
research
01/22/2020

Incentivising Exploration and Recommendations for Contextual Bandits with Payments

We propose a contextual bandit based model to capture the learning and s...
research
08/28/2020

BLOB : A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals

A common task for recommender systems is to build a pro le of the intere...
research
03/19/2019

Dynamic Learning of Sequential Choice Bandit Problem under Marketing Fatigue

Motivated by the observation that overexposure to unwanted marketing act...
research
10/05/2018

Online Learning to Rank with Features

We introduce a new model for online ranking in which the click probabili...
research
10/30/2015

CONQUER: Confusion Queried Online Bandit Learning

We present a new recommendation setting for picking out two items from a...

Please sign up or login with your details

Forgot password? Click here to reset