Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization

10/27/2021
by   Chengshuai Shi, et al.
5

Despite the significant interests and many progresses in decentralized multi-player multi-armed bandits (MP-MAB) problems in recent years, the regret gap to the natural centralized lower bound in the heterogeneous MP-MAB setting remains open. In this paper, we propose BEACON – Batched Exploration with Adaptive COmmunicatioN – that closes this gap. BEACON accomplishes this goal with novel contributions in implicit communication and efficient exploration. For the former, we propose a novel adaptive differential communication (ADC) design that significantly improves the implicit communication efficiency. For the latter, a carefully crafted batched exploration scheme is developed to enable incorporation of the combinatorial upper confidence bound (CUCB) principle. We then generalize the existing linear-reward MP-MAB problems, where the system reward is always the sum of individually collected rewards, to a new MP-MAB problem where the system reward is a general (nonlinear) function of individual rewards. We extend BEACON to solve this problem and prove a logarithmic regret. BEACON bridges the algorithm design and regret analysis of combinatorial MAB (CMAB) and MP-MAB, two largely disjointed areas in MAB, and the results in this paper suggest that this previously ignored connection is worth further investigation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2020

Stochastic Bandits with Linear Constraints

We study a constrained contextual linear bandit setting, where the goal ...
research
06/28/2023

Pure exploration in multi-armed bandits with low rank structure using oblivious sampler

In this paper, we consider the low rank structure of the reward sequence...
research
10/27/2021

(Almost) Free Incentivized Exploration from Decentralized Learning Agents

Incentivized exploration in multi-armed bandits (MAB) has witnessed incr...
research
06/25/2021

Multi-player Multi-armed Bandits with Collision-Dependent Reward Distributions

We study a new stochastic multi-player multi-armed bandits (MP-MAB) prob...
research
03/24/2021

Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

We propose a novel algorithm for multi-player multi-armed bandits withou...
research
06/10/2022

Communication Efficient Distributed Learning for Kernelized Contextual Bandits

We tackle the communication efficiency challenge of learning kernelized ...
research
02/03/2023

Multiplier Bootstrap-based Exploration

Despite the great interest in the bandit problem, designing efficient al...

Please sign up or login with your details

Forgot password? Click here to reset