Stochastic Bandits with Delay-Dependent Payoffs

10/07/2019
by   Leonardo Cella, et al.
28

Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled. After proving that finding an optimal policy is NP-hard even when all model parameters are known, we introduce a class of ranking policies provably approximating, to within a constant factor, the expected reward of the optimal policy. We show an algorithm whose regret with respect to the best ranking policy is bounded by (√(kT)), where k is the number of arms and T is time. Our algorithm uses only (k T) switches, which helps when switching between policies is costly. As constructing the class of learning policies requires ordering the arms according to their expectations, we also bound the number of pulls required to do so. Finally, we run experiments to compare our algorithm against UCB on different problem instances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2022

Reproducible Bandits

In this paper, we introduce the notion of reproducible policies in the c...
research
09/20/2021

Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits

We study a finite-horizon restless multi-armed bandit problem with multi...
research
12/26/2022

Gaussian Process Classification Bandits

Classification bandits are multi-armed bandit problems whose task is to ...
research
03/29/2022

Near-optimality for infinite-horizon restless bandits with many arms

Restless bandits are an important class of problems with applications in...
research
07/25/2021

Restless Bandits with Many Arms: Beating the Central Limit Theorem

We consider finite-horizon restless bandits with multiple pulls per peri...
research
01/23/2013

My Brain is Full: When More Memory Helps

We consider the problem of finding good finite-horizon policies for POMD...
research
05/04/2019

Pandora's Problem with Nonobligatory Inspection

Martin Weitzman's "Pandora's problem" furnishes the mathematical basis f...

Please sign up or login with your details

Forgot password? Click here to reset