Distributed Bandit Learning: How Much Communication is Needed to Achieve (Near) Optimal Regret

04/12/2019
by   Yuanhao Wang, et al.
0

We study the communication complexity of distributed multi-armed bandits (MAB) and distributed linear bandits for regret minimization. We propose communication protocols that achieve near-optimal regret bounds and result in optimal speed-up under mild conditions. We measure the communication cost of protocols by the total number of communicated numbers. For multi-armed bandits, we give two protocols that require little communication cost, one is independent of the time horizon T and the other is independent of the number of arms K . In particular, for a distributed K-armed bandit with M agents, our protocols achieve near-optimal regret O(√(MKT T)) with O(M T) and O(MK M) communication cost respectively. We also propose two protocols for d-dimensional distributed linear bandits that achieve near-optimal regret with O(M^1.5d^3) and O((Md+d d) T) communication cost respectively. The communication cost can be independent of T, or almost linear in d.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset