Sequential Choice Bandits with Feedback for Personalizing users' experience

01/05/2021
by   Anshuka Rangi, et al.
0

In this work, we study sequential choice bandits with feedback. We propose bandit algorithms for a platform that personalizes users' experience to maximize its rewards. For each action directed to a given user, the platform is given a positive reward, which is a non-decreasing function of the action, if this action is below the user's threshold. Users are equipped with a patience budget, and actions that are above the threshold decrease the user's patience. When all patience is lost, the user abandons the platform. The platform attempts to learn the thresholds of the users in order to maximize its rewards, based on two different feedback models describing the information pattern available to the platform at each action. We define a notion of regret by determining the best action to be taken when the platform knows that the user's threshold is in a given interval. We then propose bandit algorithms for the two feedback models and show that upper and lower bounds on the regret are of the order of Õ(N^2/3) and Ω̃(N^2/3), respectively, where N is the total number of users. Finally, we show that the waiting time of any user before receiving a personalized experience is uniform in N.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Anonymous Bandits for Multi-User Systems

In this work, we present and study a new framework for online learning i...
research
02/23/2018

Learning with Abandonment

Consider a platform that wants to learn a personalized policy for each u...
research
03/19/2019

Dynamic Learning of Sequential Choice Bandit Problem under Marketing Fatigue

Motivated by the observation that overexposure to unwanted marketing act...
research
06/18/2020

Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

We study the effect of persistence of engagement on learning in a stocha...
research
07/20/2020

Filtered Poisson Process Bandit on a Continuum

We consider a version of the continuum armed bandit where an action indu...
research
02/07/2023

Leveraging User-Triggered Supervision in Contextual Bandits

We study contextual bandit (CB) problems, where the user can sometimes r...
research
07/21/2023

Bandits with Deterministically Evolving States

We propose a model for learning with bandit feedback while accounting fo...

Please sign up or login with your details

Forgot password? Click here to reset