Multi-Agent Multi-Armed Bandits with Limited Communication

02/10/2021
by   Mridul Agarwal, et al.
0

We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for K ≫ N. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of Õ(√((K/N+ N)T)), communicates for O(log T) steps and broadcasts O(log K) bits in each communication step. We extend the work to sparse graphs with maximum degree K_G, and diameter D and propose LCC-UCB-GRAPH which enjoys a regret bound of Õ(D√((K/N+ K_G)DT)). Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithm perform well and outperform strategies that communicate through a central node

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2019

Social Learning in Multi Agent Multi Armed Bandits

In this paper, we introduce a distributed version of the classical stoch...
research
05/12/2022

Collaborative Multi-agent Stochastic Linear Bandits

We study a collaborative multi-agent stochastic linear bandit setting, w...
research
02/22/2021

Communication Efficient Parallel Reinforcement Learning

We consider the problem where M agents interact with M identical and ind...
research
06/07/2023

Optimal Fair Multi-Agent Bandits

In this paper, we study the problem of fair multi-agent multi-arm bandit...
research
03/09/2023

Communication-Efficient Collaborative Heterogeneous Bandits in Networks

The multi-agent multi-armed bandit problem has been studied extensively ...
research
11/16/2020

Distributed Bandits: Probabilistic Communication on d-regular Graphs

We study the decentralized multi-agent multi-armed bandit problem for ag...
research
12/01/2020

Decentralized Multi-Agent Linear Bandits with Safety Constraints

We study decentralized stochastic linear bandits, where a network of N a...

Please sign up or login with your details

Forgot password? Click here to reset