Ballooning Multi-Armed Bandits

01/24/2020
by   Ganesh Ghalme, et al.
0

In this paper, we introduce Ballooning Multi-Armed Bandits (BL-MAB), a novel extension to the classical stochastic MAB model. In BL-MAB model, the set of available arms grows (or balloons) over time. In contrast to the classical MAB setting where the regret is computed with respect to the best arm overall, the regret in a BL-MAB setting is computed with respect to the best available arm at each time. We first observe that the existing MAB algorithms are not regret-optimal for the BL-MAB model. We show that if the best arm is equally likely to arrive at any time, a sub-linear regret cannot be achieved, irrespective of the arrival of other arms. We further show that if the best arm is more likely to arrive in the early rounds, one can achieve sub-linear regret. Our proposed algorithm determines (1) the fraction of the time horizon for which the newly arriving arms should be explored and (2) the sequence of arm pulls in the exploitation phase from among the explored arms. Making reasonable assumptions on the arrival distribution of the best arm in terms of the thinness of the distribution's tail, we prove that the proposed algorithm achieves sub-linear instance-independent regret. We further quantify the explicit dependence of regret on the arrival distribution parameters. We reinforce our theoretical findings with extensive simulation results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/25/2017

Stochastic Multi-armed Bandits in Constant Space

We consider the stochastic bandit problem in the sublinear space setting...
research
10/13/2020

Multi-Armed Bandits with Dependent Arms

We study a variant of the classical multi-armed bandit problem (MABP) wh...
research
06/02/2021

Addressing the Long-term Impact of ML Decisions via Policy Regret

Machine Learning (ML) increasingly informs the allocation of opportuniti...
research
02/12/2021

The Symmetry between Arms and Knapsacks: A Primal-Dual Approach for Bandits with Knapsacks

In this paper, we study the bandits with knapsacks (BwK) problem and dev...
research
06/19/2019

Learning in Restless Multi-Armed Bandits via Adaptive Arm Sequencing Rules

We consider a class of restless multi-armed bandit (RMAB) problems with ...
research
01/23/2019

Online Learning with Diverse User Preferences

In this paper, we investigate the impact of diverse user preference on l...
research
09/29/2021

Batched Bandits with Crowd Externalities

In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be u...

Please sign up or login with your details

Forgot password? Click here to reset