Regret Lower Bounds in Multi-agent Multi-armed Bandit

08/15/2023
by   Mengfan Xu, et al.
0

Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by regret. While efficient algorithms with regret upper bounds have emerged, limited attention has been given to the corresponding regret lower bounds, except for a recent lower bound for adversarial settings, which, however, has a gap with let known upper bounds. To this end, we herein provide the first comprehensive study on regret lower bounds across different settings and establish their tightness. Specifically, when the graphs exhibit good connectivity properties and the rewards are stochastically distributed, we demonstrate a lower bound of order O(log T) for instance-dependent bounds and √(T) for mean-gap independent bounds which are tight. Assuming adversarial rewards, we establish a lower bound O(T^2/3) for connected graphs, thereby bridging the gap between the lower and upper bound in the prior work. We also show a linear regret lower bound when the graph is disconnected. While previous works have explored these settings with upper bounds, we provide a thorough study on tight lower bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2022

Pareto Regret Analyses in Multi-objective Multi-armed Bandit

We study Pareto optimality in multi-objective multi-armed bandit by prov...
research
06/08/2023

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

We study a decentralized multi-agent multi-armed bandit problem in which...
research
07/12/2022

Optimal Clustering with Noisy Queries via Multi-Armed Bandit

Motivated by many applications, we study clustering with a faulty oracle...
research
11/11/2021

Solving Multi-Arm Bandit Using a Few Bits of Communication

The multi-armed bandit (MAB) problem is an active learning framework tha...
research
06/14/2022

On the Finite-Time Performance of the Knowledge Gradient Algorithm

The knowledge gradient (KG) algorithm is a popular and effective algorit...
research
09/22/2021

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

This paper considers two fundamental sequential decision-making problems...
research
10/29/2020

Multitask Bandit Learning through Heterogeneous Feedback Aggregation

In many real-world applications, multiple agents seek to learn how to pe...

Please sign up or login with your details

Forgot password? Click here to reset