Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

06/18/2020
by   Priyank Agrawal, et al.
0

We study the effect of persistence of engagement on learning in a stochastic multi-armed bandit setting. In advertising and recommendation systems, repetition effect includes a wear-in period, where the user's propensity to reward the platform via a click or purchase depends on how frequently they see the recommendation in the recent past. It also includes a counteracting wear-out period, where the user's propensity to respond positively is dampened if the recommendation was shown too many times recently. Priming effect can be naturally modelled as a temporal constraint on the strategy space, since the reward for the current action depends on historical actions taken by the platform. We provide novel algorithms that achieves sublinear regret in time and the relevant wear-in/wear-out parameters. The effect of priming on the regret upper bound is also additive, and we get back a guarantee that matches popular algorithms such as the UCB1 and Thompson sampling when there is no priming effect. Our work complements recent work on modeling time varying rewards, delays and corruptions in bandits, and extends the usage of rich behavior models in sequential decision making settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2018

Bandits with Temporal Stochastic Constraints

We study the effect of impairment on stochastic multi-armed bandits and ...
research
07/21/2023

Bandits with Deterministically Evolving States

We propose a model for learning with bandit feedback while accounting fo...
research
03/04/2020

Odds-Ratio Thompson Sampling to Control for Time-Varying Effect

Multi-armed bandit methods have been used for dynamic experiments partic...
research
01/31/2019

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model

Contextual multi-armed bandit (MAB) algorithms have been shown promising...
research
11/16/2022

Dynamical Linear Bandits

In many real-world sequential decision-making problems, an action does n...
research
07/31/2018

Graph-Based Recommendation System

In this work, we study recommendation systems modelled as contextual mul...
research
01/05/2021

Sequential Choice Bandits with Feedback for Personalizing users' experience

In this work, we study sequential choice bandits with feedback. We propo...

Please sign up or login with your details

Forgot password? Click here to reset