Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms

06/17/2022
by   Xuchuang Wang, et al.
0

We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a shareable arm setting, in which several plays can share the same arm. Furthermore, each shareable arm has a finite reward capacity and a ”per-load” reward distribution, both of which are unknown to the learner. The reward from a shareable arm is load-dependent, which is the "per-load" reward multiplying either the number of plays pulling the arm, or its reward capacity when the number of plays exceeds the capacity limit. When the "per-load" reward follows a Gaussian distribution, we prove a sample complexity lower bound of learning the capacity from load-dependent rewards and also a regret lower bound of this new MP-MAB problem. We devise a capacity estimator whose sample complexity upper bound matches the lower bound in terms of reward means and capacities. We also propose an online learning algorithm to address the problem and prove its regret upper bound. This regret upper bound's first term is the same as regret lower bound's, and its second and third terms also evidently correspond to lower bound's. Extensive experiments validate our algorithm's performance and also its gain in 5G 4G base station selection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2023

Multi-Fidelity Multi-Armed Bandits Revisited

We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of...
research
03/03/2021

Combinatorial Bandits without Total Order for Arms

We consider the combinatorial bandits problem, where at each time step, ...
research
06/13/2023

Tight Memory-Regret Lower Bounds for Streaming Bandits

In this paper, we investigate the streaming bandits problem, wherein the...
research
06/25/2019

Restless dependent bandits with fading memory

We study the stochastic multi-armed bandit problem in the case when the ...
research
03/07/2022

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

In this paper, we study the stochastic bandits problem with k unknown he...
research
05/03/2022

Norm-Agnostic Linear Bandits

Linear bandits have a wide variety of applications including recommendat...
research
01/23/2020

Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting

We design and analyze CascadeBAI, an algorithm for finding the best set ...

Please sign up or login with your details

Forgot password? Click here to reset