Decentralized Cooperative Stochastic Multi-armed Bandits

10/10/2018
by   David Martinez Rubio, et al.
0

We study a decentralized cooperative stochastic multi-armed bandit problem with K arms on a network of N agents. In our model, the reward distribution of each arm is agent-independent. Each agent chooses iteratively one arm to play and then communicates to her neighbors. The aim is to minimize the total network regret. We design a fully decentralized algorithm that uses a running consensus procedure to compute, with some delay, accurate estimations of the average of rewards obtained by all the agents for each arm, and then uses an upper confidence bound algorithm that accounts for the delay and error of the estimations. We analyze the algorithm and up to a constant our regret bounds are better for all networks than other algorithms designed to solve the same problem. For some graphs, our regret bounds are significantly better.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2020

Bayesian Algorithms for Decentralized Stochastic Bandits

We study a decentralized cooperative multi-agent multi-armed bandit prob...
research
11/22/2021

Decentralized Multi-Armed Bandit Can Outperform Classic Upper Confidence Bound

This paper studies a decentralized multi-armed bandit problem in a multi...
research
11/16/2020

Distributed Bandits: Probabilistic Communication on d-regular Graphs

We study the decentralized multi-agent multi-armed bandit problem for ag...
research
09/20/2017

Bandits with Delayed Anonymous Feedback

We study the bandits with delayed anonymous feedback problem, a variant ...
research
03/03/2020

Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits

We study a distributed decision-making problem in which multiple agents ...
research
11/05/2019

Response Prediction for Low-Regret Agents

Companies like Google and Microsoft run billions of auctions every day t...
research
06/08/2023

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

We study a decentralized multi-agent multi-armed bandit problem in which...

Please sign up or login with your details

Forgot password? Click here to reset