An Optimal Algorithm in Multiplayer Multi-Armed Bandits

09/28/2019
by   Alexandre Proutiere, et al.
26

The paper addresses the Multiplayer Multi-Armed Bandit (MMAB) problem, where M decision makers or players collaborate to maximize their cumulative reward. When several players select the same arm, a collision occurs and no reward is collected on this arm. Players involved in a collision are informed about this collision. We present DPE (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same regret as that obtained by an optimal centralized algorithm. Our algorithm has better regret guarantees than the state-of-the-art algorithm SIC-MMAB boursier2019. As in SIC-MMAB, players communicate through collisions only. An additional important advantage of DPE is that it requires very little communication. Specifically, the expected number of rounds where players use collisions to communicate is finite.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2019

New Algorithms for Multiplayer Bandits when Arm Means Vary Among Players

We study multiplayer stochastic multi-armed bandit problems in which the...
research
09/21/2018

SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits

We consider the stochastic multiplayer multi-armed bandit problem, where...
research
04/28/2022

Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms Applications

Multi-player multi-armed bandits (MMAB) study how decentralized players ...
research
02/19/2022

The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player Multi-Armed Bandits with no Communication

We study the stochastic multi-player multi-armed bandit problem. In this...
research
08/25/2018

Multiplayer bandits without observing collision information

We study multiplayer stochastic multi-armed bandit problems in which the...
research
09/17/2018

Multi-Player Bandits: A Trekking Approach

We study stochastic multi-armed bandits with many players. The players d...
research
06/25/2021

Multi-player Multi-armed Bandits with Collision-Dependent Reward Distributions

We study a new stochastic multi-player multi-armed bandits (MP-MAB) prob...

Please sign up or login with your details

Forgot password? Click here to reset