Distributed Bandit Learning: How Much Communication is Needed to Achieve (Near) Optimal Regret

04/12/2019
by   Yuanhao Wang, et al.
0

We study the communication complexity of distributed multi-armed bandits (MAB) and distributed linear bandits for regret minimization. We propose communication protocols that achieve near-optimal regret bounds and result in optimal speed-up under mild conditions. We measure the communication cost of protocols by the total number of communicated numbers. For multi-armed bandits, we give two protocols that require little communication cost, one is independent of the time horizon T and the other is independent of the number of arms K . In particular, for a distributed K-armed bandit with M agents, our protocols achieve near-optimal regret O(√(MKT T)) with O(M T) and O(MK M) communication cost respectively. We also propose two protocols for d-dimensional distributed linear bandits that achieve near-optimal regret with O(M^1.5d^3) and O((Md+d d) T) communication cost respectively. The communication cost can be independent of T, or almost linear in d.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2020

Coordination without communication: optimal regret in two players multi-armed bandits

We consider two agents playing simultaneously the same stochastic three-...
research
05/18/2015

Simple regret for infinitely many armed bandits

We consider a stochastic bandit problem with infinitely many arms. In th...
research
06/10/2022

Communication Efficient Distributed Learning for Kernelized Contextual Bandits

We tackle the communication efficiency challenge of learning kernelized ...
research
02/26/2023

No-Regret Linear Bandits beyond Realizability

We study linear bandits when the underlying reward function is not linea...
research
11/01/2021

Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure

Multi-agent reinforcement learning (MARL) problems are challenging due t...
research
09/26/2020

Near-Optimal MNL Bandits Under Risk Criteria

We study MNL bandits, which is a variant of the traditional multi-armed ...
research
01/04/2021

Be Greedy in Multi-Armed Bandits

The Greedy algorithm is the simplest heuristic in sequential decision pr...

Please sign up or login with your details

Forgot password? Click here to reset