Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds

01/27/2023
by   Johan Östman, et al.
0

We consider a decentralized multiplayer game, played over T rounds, with a leader-follower hierarchy described by a directed acyclic graph. For each round, the graph structure dictates the order of the players and how players observe the actions of one another. By the end of each round, all players receive a joint bandit-reward based on their joint action that is used to update the player strategies towards the goal of minimizing the joint pseudo-regret. We present a learning algorithm inspired by the single-player multi-armed bandit problem and show that it achieves sub-linear joint pseudo-regret in the number of rounds for both adversarial and stochastic bandit rewards. Furthermore, we quantify the cost incurred due to the decentralized nature of our problem compared to the centralized setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2021

Online Learning for Cooperative Multi-Player Multi-Armed Bandits

We introduce a framework for decentralized online learning for multi-arm...
research
05/04/2015

On Regret-Optimal Learning in Decentralized Multi-player Multi-armed Bandits

We consider the problem of learning in single-player and multiplayer mul...
research
10/26/2018

Distributed Multi-Player Bandits - a Game of Thrones Approach

We consider a multi-armed bandit game where N players compete for K arms...
research
02/21/2018

Universal Growth in Production Economies

We study a simple variant of the von Neumann model of an expanding econo...
research
10/29/2020

Multitask Bandit Learning through Heterogeneous Feedback Aggregation

In many real-world applications, multiple agents seek to learn how to pe...
research
03/13/2015

Interactive Restless Multi-armed Bandit Game and Swarm Intelligence Effect

We obtain the conditions for the emergence of the swarm intelligence eff...
research
10/14/2022

Stability of Decentralized Queueing Networks Beyond Complete Bipartite Cases

Gaitonde and Tardos recently studied a model of queueing networks where ...

Please sign up or login with your details

Forgot password? Click here to reset