Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits

03/03/2020
by   Peter Landgren, et al.
0

We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their estimates over a fixed communication graph. We consider an unconstrained reward model in which two or more agents can choose the same arm and collect independent rewards. And we consider a constrained reward model in which agents that choose the same arm at the same time receive no reward. We design a dynamic, consensus-based, distributed estimation algorithm for cooperative estimation of mean rewards at each arm. We leverage the estimates from this algorithm to develop two distributed algorithms: coop-UCB2 and coop-UCB2-selective-learning, for the unconstrained and constrained reward models, respectively. We show that both algorithms achieve group performance close to the performance of a centralized fusion center. Further, we investigate the influence of the communication graph structure on performance. We propose a novel graph explore-exploit index that predicts the relative performance of groups in terms of the communication graph, and we propose a novel nodal explore-exploit centrality index that predicts the relative performance of agents in terms of the agent locations in the communication graph.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2015

On Distributed Cooperative Decision-Making in Multiarmed Bandits

We study the explore-exploit tradeoff in distributed cooperative decisio...
research
06/09/2023

Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit

We study a structured multi-agent multi-armed bandit (MAMAB) problem in ...
research
10/10/2018

Decentralized Cooperative Stochastic Multi-armed Bandits

We study a decentralized cooperative stochastic multi-armed bandit probl...
research
01/01/2022

Modelling Cournot Games as Multi-agent Multi-armed Bandits

We investigate the use of a multi-agent multi-armed bandit (MA-MAB) sett...
research
09/02/2020

Heterogeneous Explore-Exploit Strategies on Multi-Star Networks

We investigate the benefits of heterogeneity in multi-agent explore-expl...
research
05/22/2019

Thresholding Graph Bandits with GrAPL

In this paper, we introduce a new online decision making paradigm that w...
research
12/14/2020

Efficient Querying for Cooperative Probabilistic Commitments

Multiagent systems can use commitments as the core of a general coordina...

Please sign up or login with your details

Forgot password? Click here to reset