An Option and Agent Selection Policy with Logarithmic Regret for Multi Agent Multi Armed Bandit Problems on Random Graphs

10/07/2019
by   Pathmanathan Pankayaraj, et al.
0

Existing studies of the Multi Agent Multi Armed Bandit (MAMAB) problem, with the exception of a very few, consider the case where the agents observe their neighbors according to a static network graph. They also mostly rely on a running consensus for the estimation of the option rewards. Two of the exceptions consider a problem where agents observe instantaneous rewards and actions of their neighbors through an iid ER graph process based communication strategy. In this paper we propose a UCB based option allocation rule that guarantees logarithmic regret even if the graph depends on the history of choices made by the agents. The paper also proposes a novel communication strategy that significantly outperforms the iid ER graph based communication strategy. In both the ER graph and the dependent graph strategy, the regret is shown to depend on the connectivity of the graph in a particularly interesting way where there exists an optimal connectivity of the graph that is less than the full connectivity of the graph.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2019

Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

We study agents communicating over an underlying network by exchanging m...
research
05/21/2019

Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem

We define and analyze a multi-agent multi-armed bandit problem in which ...
research
04/08/2020

A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

We define and analyze a multi-agent multi-armed bandit problem in which ...
research
03/29/2021

Distributed learning in congested environments with partial information

How can non-communicating agents learn to share congested resources effi...
research
10/02/2021

Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams

When humans collaborate with each other, they often make decisions by ob...
research
02/13/2015

Decision Maker using Coupled Incompressible-Fluid Cylinders

The multi-armed bandit problem (MBP) is the problem of finding, as accur...

Please sign up or login with your details

Forgot password? Click here to reset