Blocking Bandits

07/27/2019
by   Soumya Basu, et al.
0

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same product recommendation repeatedly) or infeasible (e.g. compute job scheduling on machines). We show that with prior knowledge of the rewards and delays of all the arms, the problem of optimizing cumulative reward does not admit any pseudo-polynomial time algorithm (in the number of arms) unless randomized exponential time hypothesis is false, by mapping to the PINWHEEL scheduling problem. Subsequently, we show that a simple greedy algorithm that plays the available arm with the highest reward is asymptotically (1-1/e) optimal. When the rewards are unknown, we design a UCB based algorithm which is shown to have c T + o( T) cumulative regret against the greedy algorithm, leveraging the free exploration of arms due to the unavailability. Finally, when all the delays are equal the problem reduces to Combinatorial Semi-bandits providing us with a lower bound of c' T+ ω( T).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2021

Combinatorial Blocking Bandits with Stochastic Delays

Recent work has considered natural variations of the multi-armed bandit ...
research
03/06/2020

Contextual Blocking Bandits

We study a novel variant of the multi-armed bandit problem, where at eac...
research
08/19/2022

Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits

We study the Improving Multi-Armed Bandit (IMAB) problem, where the rewa...
research
01/30/2021

Recurrent Submodular Welfare and Matroid Blocking Bandits

A recent line of research focuses on the study of the stochastic multi-a...
research
01/17/2023

Optimal Algorithms for Latent Bandits with Cluster Structure

We consider the problem of latent bandits with cluster structure where t...
research
02/14/2021

Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time

We consider combinatorial semi-bandits with uncorrelated Gaussian reward...
research
01/12/2016

Infomax strategies for an optimal balance between exploration and exploitation

Proper balance between exploitation and exploration is what makes good d...

Please sign up or login with your details

Forgot password? Click here to reset