Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms Applications

04/28/2022
by   Xuchuang Wang, et al.
0

Multi-player multi-armed bandits (MMAB) study how decentralized players cooperatively play the same multi-armed bandit so as to maximize their total cumulative rewards. Existing MMAB models mostly assume when more than one player pulls the same arm, they either have a collision and obtain zero rewards, or have no collision and gain independent rewards, both of which are usually too restrictive in practical scenarios. In this paper, we propose an MMAB with shareable resources as an extension to the collision and non-collision settings. Each shareable arm has finite shareable resources and a "per-load" reward random variable, both of which are unknown to players. The reward from a shareable arm is equal to the "per-load" reward multiplied by the minimum between the number of players pulling the arm and the arm's maximal shareable resources. We consider two types of feedback: sharing demand information (SDI) and sharing demand awareness (SDA), each of which provides different signals of resource sharing. We design the DPE-SDI and SIC-SDA algorithms to address the shareable arm problem under these two cases of feedback respectively and prove that both algorithms have logarithmic regrets that are tight in the number of rounds. We conduct simulations to validate both algorithms' performance and show their utilities in wireless networking and edge computing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2019

An Optimal Algorithm in Multiplayer Multi-Armed Bandits

The paper addresses the Multiplayer Multi-Armed Bandit (MMAB) problem, w...
research
11/15/2022

Multi-Player Bandits Robust to Adversarial Collisions

Motivated by cognitive radios, stochastic Multi-Player Multi-Armed Bandi...
research
02/04/2019

New Algorithms for Multiplayer Bandits when Arm Means Vary Among Players

We study multiplayer stochastic multi-armed bandit problems in which the...
research
06/21/2021

Smooth Sequential Optimisation with Delayed Feedback

Stochastic delays in feedback lead to unstable sequential learning using...
research
05/30/2023

Competing for Shareable Arms in Multi-Player Multi-Armed Bandits

Competitions for shareable and limited resources have long been studied ...
research
10/21/2019

Multi-player Multi-Armed Bandits with non-zero rewards on collisions for uncoordinated spectrum access

In this paper, we study the uncoordinated spectrum access problem using ...
research
02/29/2020

Decentralized Multi-player Multi-armed Bandits with No Collision Information

The decentralized stochastic multi-player multi-armed bandit (MP-MAB) pr...

Please sign up or login with your details

Forgot password? Click here to reset