Distributed Thompson Sampling

12/03/2020
by   Jing Dong, et al.
0

We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2020

Bayesian Algorithms for Decentralized Stochastic Bandits

We study a decentralized cooperative multi-agent multi-armed bandit prob...
research
03/19/2021

Multi-Robot Dynamical Source Seeking in Unknown Environments

This paper presents an algorithmic framework for the distributed on-line...
research
02/15/2021

Distributed Online Learning for Joint Regret with Communication Constraints

In this paper we consider a distributed online learning setting for jo...
research
11/30/2022

On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits

We consider the nonstochastic multi-agent multi-armed bandit problem wit...
research
05/22/2018

Cost-aware Cascading Bandits

In this paper, we propose a cost-aware cascading bandits model, a new va...
research
03/19/2021

On the design of autonomous agents from multiple data sources

This paper is concerned with the problem of designing agents able to dyn...
research
07/29/2009

Cooperative Training for Attribute-Distributed Data: Trade-off Between Data Transmission and Performance

This paper introduces a modeling framework for distributed regression wi...

Please sign up or login with your details

Forgot password? Click here to reset