Bayesian Algorithms for Decentralized Stochastic Bandits

10/20/2020
by   Anusha Lalitha, et al.
9

We study a decentralized cooperative multi-agent multi-armed bandit problem with K arms and N agents connected over a network. In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps. In each round, agents choose an arm to play and subsequently send a message to their neighbors. The goal is to minimize cumulative regret averaged over the entire network. We propose a decentralized Bayesian multi-armed bandit framework that extends single-agent Bayesian bandit algorithms to the decentralized setting. Specifically, we study an information assimilation algorithm that can be combined with existing Bayesian algorithms, and using this, we propose a decentralized Thompson Sampling algorithm and decentralized Bayes-UCB algorithm. We analyze the decentralized Thompson Sampling algorithm under Bernoulli rewards and establish a problem-dependent upper bound on the cumulative regret. We show that regret incurred scales logarithmically over the time horizon with constants that match those of an optimal centralized agent with access to all observations across the network. Our analysis also characterizes the cumulative regret in terms of the network structure. Through extensive numerical studies, we show that our extensions of Thompson Sampling and Bayes-UCB incur lesser cumulative regret than the state-of-art algorithms inspired by the Upper Confidence Bound algorithm. We implement our proposed decentralized Thompson Sampling under gossip protocol, and over time-varying networks, where each communication link has a fixed probability of failure.

READ FULL TEXT
research
10/10/2018

Decentralized Cooperative Stochastic Multi-armed Bandits

We study a decentralized cooperative stochastic multi-armed bandit probl...
research
12/03/2020

Distributed Thompson Sampling

We study a cooperative multi-agent multi-armed bandits with M agents and...
research
11/16/2020

Distributed Bandits: Probabilistic Communication on d-regular Graphs

We study the decentralized multi-agent multi-armed bandit problem for ag...
research
10/02/2021

Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams

When humans collaborate with each other, they often make decisions by ob...
research
04/08/2020

A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

We define and analyze a multi-agent multi-armed bandit problem in which ...
research
06/08/2023

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

We study a decentralized multi-agent multi-armed bandit problem in which...
research
01/22/2023

Doubly Adversarial Federated Bandits

We study a new non-stochastic federated multi-armed bandit problem with ...

Please sign up or login with your details

Forgot password? Click here to reset