Coordination without communication: optimal regret in two players multi-armed bandits

02/14/2020
by   Sébastien Bubeck, et al.
20

We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. We propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret O(√(T log(T))). We also provide evidence that the extra logarithmic term √(log(T)) is necessary, with a lower bound for a variant of the problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2020

Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions

We consider the cooperative multi-player version of the stochastic multi...
research
04/12/2019

Distributed Bandit Learning: How Much Communication is Needed to Achieve (Near) Optimal Regret

We study the communication complexity of distributed multi-armed bandits...
research
02/13/2016

Conservative Bandits

We study a novel multi-armed bandit problem that models the challenge fa...
research
06/08/2021

Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions

We study the problem of stochastic bandits with adversarial corruptions ...
research
07/10/2018

Bandits with Side Observations: Bounded vs. Logarithmic Regret

We consider the classical stochastic multi-armed bandit but where, from ...
research
10/20/2018

Quantifying the Burden of Exploration and the Unfairness of Free Riding

We consider the multi-armed bandit setting with a twist. Rather than hav...
research
10/26/2015

A Parallel algorithm for X-Armed bandits

The target of X-armed bandit problem is to find the global maximum of an...

Please sign up or login with your details

Forgot password? Click here to reset