Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

10/04/2022
by   Haipeng Luo, et al.
0

We study high-probability regret bounds for adversarial K-armed bandits with time-varying feedback graphs over T rounds. For general strongly observable graphs, we develop an algorithm that achieves the optimal regret 𝒪((∑_t=1^Tα_t)^1/2+max_t∈[T]α_t) with high probability, where α_t is the independence number of the feedback graph at round t. Compared to the best existing result [Neu, 2015] which only considers graphs with self-loops for all nodes, our result not only holds more generally, but importantly also removes any poly(K) dependence that can be prohibitively large for applications such as contextual bandits. Furthermore, we also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs, which even improves the best expected regret bound of [Alon et al., 2015] by removing the 𝒪(√(KT)) term with a refined analysis. Our algorithms are based on the online mirror descent framework, but importantly with an innovative combination of several techniques. Notably, while earlier works use optimistic biased loss estimators for achieving high-probability bounds, we find it important to use a pessimistic one for nodes without self-loop in a strongly observable graph.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2020

A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

We study small-loss bounds for the adversarial multi-armed bandits probl...
research
06/02/2022

Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs

This study considers online learning with general directed feedback grap...
research
06/14/2020

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

We develop a new approach to obtaining high probability regret bounds fo...
research
05/30/2022

Improved Algorithms for Bandit with Graph Feedback via Regret Decomposition

The problem of bandit with graph feedback generalizes both the multi-arm...
research
10/27/2020

Adversarial Dueling Bandits

We introduce the problem of regret minimization in Adversarial Dueling B...
research
06/07/2022

Decentralized Online Regularized Learning Over Random Time-Varying Graphs

We study the decentralized online regularized linear regression algorith...
research
10/25/2022

Parameter-free Regret in High Probability with Heavy Tails

We present new algorithms for online convex optimization over unbounded ...

Please sign up or login with your details

Forgot password? Click here to reset