A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

02/02/2020
by   Chung-Wei Lee, et al.
0

We study small-loss bounds for the adversarial multi-armed bandits problem with graph feedback, that is, adaptive regret bounds that depend on the loss of the best arm or related quantities, instead of the total number of rounds. We derive the first small-loss bound for general strongly observable graphs, resolving an open problem proposed in (Lykouris et al., 2018). Specifically, we develop an algorithm with regret Õ(√(κ L_*)) where κ is the clique partition number and L_* is the loss of the best arm, and for the special case where every arm has a self-loop, we improve the regret to Õ(min{√(α T), √(κ L_*)}) where α≤κ is the independence number. Our results significantly improve and extend those by Lykouris et al. (2018) who only consider self-aware undirected graphs. Furthermore, we also take the first attempt at deriving small-loss bounds for weakly observable graphs. We first prove that no typical small-loss bounds are achievable in this case, and then propose algorithms with alternative small-loss bounds in terms of the loss of some specific subset of arms. A surprising side result is that Õ(√(T)) regret is achievable even for weakly observable graphs as long as the best arm has a self-loop. Our algorithms are based on the Online Mirror Descent framework but require a suite of novel techniques that might be of independent interest. Moreover, all our algorithms can be made parameter-free without the knowledge of the environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2022

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

We study high-probability regret bounds for adversarial K-armed bandits ...
research
06/02/2022

Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs

This study considers online learning with general directed feedback grap...
research
05/30/2022

Improved Algorithms for Bandit with Graph Feedback via Regret Decomposition

The problem of bandit with graph feedback generalizes both the multi-arm...
research
06/16/2022

Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback

The problem of online learning with graph feedback has been extensively ...
research
12/27/2021

Tracking Most Severe Arm Changes in Bandits

In bandits with distribution shifts, one aims to automatically detect an...
research
10/14/2019

An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays

We propose a new algorithm for adversarial multi-armed bandits with unre...
research
05/30/2022

Adversarial Bandits Robust to S-Switch Regret

We study the adversarial bandit problem under S number of switching best...

Please sign up or login with your details

Forgot password? Click here to reset