Collaborative Multi-agent Stochastic Linear Bandits

05/12/2022
โˆ™
by   Ahmadreza Moradipari, et al.
โˆ™
0
โˆ™

We study a collaborative multi-agent stochastic linear bandit setting, where N agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward parameter) and the goal is to select the best global action w.r.t. the average of their reward parameters. At each round, each agent proposes an action, and one action is randomly selected and played as the network action. All the agents observe the corresponding rewards of the played actions and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents. We propose a distributed upper confidence bound (UCB) algorithm and prove a high probability bound on its T-round regret in which we include a linear growth of regret associated with each communication round. Our regret bound is of order ๐’ช(โˆš(T/N log(1/|ฮป_2|))ยท (log T)^2), where ฮป_2 is the second largest (in absolute value) eigenvalue of the communication matrix.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
โˆ™ 02/10/2021

Multi-Agent Multi-Armed Bandits with Limited Communication

We consider the problem where N agents collaboratively interact with an ...
research
โˆ™ 07/02/2020

Multi-Agent Low-Dimensional Linear Bandits

We study a multi-agent stochastic linear bandit with side information, p...
research
โˆ™ 02/15/2021

Distributed Online Learning for Joint Regret with Communication Constraints

In this paper we consider a distributed online learning setting for jo...
research
โˆ™ 07/29/2019

Reinforcement with Fading Memories

We study the effect of imperfect memory on decision making in the contex...
research
โˆ™ 12/01/2020

Decentralized Multi-Agent Linear Bandits with Safety Constraints

We study decentralized stochastic linear bandits, where a network of N a...
research
โˆ™ 02/15/2021

Secure-UCB: Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

This paper studies bandit algorithms under data poisoning attacks in a b...
research
โˆ™ 05/30/2023

Cooperative Thresholded Lasso for Sparse Linear Bandit

We present a novel approach to address the multi-agent sparse contextual...

Please sign up or login with your details

Forgot password? Click here to reset