Combinatorial Blocking Bandits with Stochastic Delays

05/22/2021
by   Alexia Atsidakou, et al.
0

Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable) model is that of blocking bandits, where an arm becomes unavailable for a deterministic number of rounds after each play. In this work, we extend the above model in two directions: (i) We consider the general combinatorial setting where more than one arms can be played at each round, subject to feasibility constraints. (ii) We allow the blocking time of each arm to be stochastic. We first study the computational/unconditional hardness of the above setting and identify the necessary conditions for the problem to become tractable (even in an approximate sense). Based on these conditions, we provide a tight analysis of the approximation guarantee of a natural greedy heuristic that always plays the maximum expected reward feasible subset among the available (non-blocked) arms. When the arms' expected rewards are unknown, we adapt the above heuristic into a bandit algorithm, based on UCB, for which we provide sublinear (approximate) regret guarantees, matching the theoretical lower bounds in the limiting case of absence of delays.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2019

Blocking Bandits

We consider a novel stochastic multi-armed bandit setting, where playing...
research
01/30/2021

Recurrent Submodular Welfare and Matroid Blocking Bandits

A recent line of research focuses on the study of the stochastic multi-a...
research
07/01/2020

Variable Selection via Thompson Sampling

Thompson sampling is a heuristic algorithm for the multi-armed bandit pr...
research
11/13/2020

Rebounding Bandits for Modeling Satiation Effects

Psychological research shows that enjoyment of many goods is subject to ...
research
06/18/2020

Stochastic bandits with arm-dependent delays

Significant work has been recently dedicated to the stochastic delayed b...
research
10/05/2021

Contextual Combinatorial Volatile Bandits via Gaussian Processes

We consider a contextual bandit problem with a combinatorial action set ...
research
05/29/2022

Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret

The stochastic multi-armed bandit setting has been recently studied in t...

Please sign up or login with your details

Forgot password? Click here to reset