Rebounding Bandits for Modeling Satiation Effects

11/13/2020
by   Liu Leqi, et al.
0

Psychological research shows that enjoyment of many goods is subject to satiation, with enjoyment declining after repeated exposures to the same item. Nevertheless, proposed algorithms for powering recommender systems seldom model these dynamics, instead proceeding as though user preferences were fixed in time. In this work, we adopt a multi-armed bandit setup, modeling satiation dynamics as a time-invariant linear dynamical system. In our model, the expected rewards for each arm decline monotonically with consecutive exposures and rebound towards the initial reward whenever that arm is not pulled. We analyze this model, showing that, when the arms exhibit deterministic identical dynamics, our problem is equivalent to a specific instance of Max K-Cut. In this case, a greedy policy, which plays the arms in a cyclic order, is optimal. In the general setting, where each arm's satiation dynamics are stochastic and governed by different (unknown) parameters, we propose an algorithm that first uses offline data to estimate each arm's reward model and then plans using a generalization of the greedy policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2022

Restless Multi-Armed Bandits under Exogenous Global Markov Process

We consider an extension to the restless multi-armed bandit (RMAB) probl...
research
12/17/2021

Learning in Restless Bandits under Exogenous Global Markov Process

We consider an extension to the restless multi-armed bandit (RMAB) probl...
research
05/22/2021

Combinatorial Blocking Bandits with Stochastic Delays

Recent work has considered natural variations of the multi-armed bandit ...
research
05/12/2022

The Experimental Multi-Arm Pendulum on a Cart: A Benchmark System for Chaos, Learning, and Control

The single, double, and triple pendulum has served as an illustrative ex...
research
05/10/2022

Risk Aversion In Learning Algorithms and an Application To Recommendation Systems

Consider a bandit learning environment. We demonstrate that popular lear...
research
05/19/2021

Incentivized Bandit Learning with Self-Reinforcing User Preferences

In this paper, we investigate a new multi-armed bandit (MAB) online lear...
research
06/14/2023

Decentralized Learning Dynamics in the Gossip Model

We study a distributed multi-armed bandit setting among a population of ...

Please sign up or login with your details

Forgot password? Click here to reset