Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions

11/08/2020
by   Sébastien Bubeck, et al.
12

We consider the cooperative multi-player version of the stochastic multi-armed bandit problem. We study the regime where the players cannot communicate but have access to shared randomness. In prior work by the first two authors, a strategy for this regime was constructed for two players and three arms, with regret Õ(√(T)), and with no collisions at all between the players (with very high probability). In this paper we show that these properties (near-optimal regret and no collisions at all) are achievable for any number of players and arms. At a high level, the previous strategy heavily relied on a 2-dimensional geometric intuition that was difficult to generalize in higher dimensions, while here we take a more combinatorial route to build the new strategy.

READ FULL TEXT
research
02/14/2020

Coordination without communication: optimal regret in two players multi-armed bandits

We consider two agents playing simultaneously the same stochastic three-...
research
04/28/2019

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

We consider the non-stochastic version of the (cooperative) multi-player...
research
12/09/2015

Multi-Player Bandits -- a Musical Chairs Approach

We consider a variant of the stochastic multi-armed bandit problem, wher...
research
11/08/2021

An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit

We study the problem of information sharing and cooperation in Multi-Pla...
research
06/10/2015

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

This work addresses the problem of regret minimization in non-stochastic...
research
02/19/2022

The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player Multi-Armed Bandits with no Communication

We study the stochastic multi-player multi-armed bandit problem. In this...
research
02/23/2020

My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

Consider N cooperative but non-communicating players where each plays on...

Please sign up or login with your details

Forgot password? Click here to reset