Stochastic Bandits with Delay-Dependent Payoffs

by   Leonardo Cella, et al.

Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled. After proving that finding an optimal policy is NP-hard even when all model parameters are known, we introduce a class of ranking policies provably approximating, to within a constant factor, the expected reward of the optimal policy. We show an algorithm whose regret with respect to the best ranking policy is bounded by (√(kT)), where k is the number of arms and T is time. Our algorithm uses only (k T) switches, which helps when switching between policies is costly. As constructing the class of learning policies requires ordering the arms according to their expectations, we also bound the number of pulls required to do so. Finally, we run experiments to compare our algorithm against UCB on different problem instances.



There are no comments yet.


page 1

page 2

page 3

page 4


Combinatorial Bandits without Total Order for Arms

We consider the combinatorial bandits problem, where at each time step, ...

Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits

We study a finite-horizon restless multi-armed bandit problem with multi...

Near-optimality for infinite-horizon restless bandits with many arms

Restless bandits are an important class of problems with applications in...

Restless Bandits with Many Arms: Beating the Central Limit Theorem

We consider finite-horizon restless bandits with multiple pulls per peri...

Pandora's Problem with Nonobligatory Inspection

Martin Weitzman's "Pandora's problem" furnishes the mathematical basis f...

My Brain is Full: When More Memory Helps

We consider the problem of finding good finite-horizon policies for POMD...

Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach

Controlling antenna tilts in cellular networks is imperative to reach an...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.