Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs

06/02/2022
by   Shinji Ito, et al.
0

This study considers online learning with general directed feedback graphs. For this problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments as well as poly-logarithmic regret bounds for stochastic environments. As Alon et al. [2015] have shown, tight regret bounds depend on the structure of the feedback graph: strongly observable graphs yield minimax regret of Θ̃( α^1/2 T^1/2 ), while weakly observable graphs induce minimax regret of Θ̃( δ^1/3 T^2/3 ), where α and δ, respectively, represent the independence number of the graph and the domination number of a certain portion of the graph. Our proposed algorithm for strongly observable graphs has a regret bound of Õ( α^1/2 T^1/2 ) for adversarial environments, as well as of O ( α (ln T)^3 /Δ_min ) for stochastic environments, where Δ_min expresses the minimum suboptimality gap. This result resolves an open question raised by Erez and Koren [2021]. We also provide an algorithm for weakly observable graphs that achieves a regret bound of Õ( δ^1/3T^2/3 ) for adversarial environments and poly-logarithmic regret for stochastic environments. The proposed algorithms are based on the follow-the-perturbed-leader approach combined with newly designed update rules for learning rates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2022

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

We study high-probability regret bounds for adversarial K-armed bandits ...
research
07/29/2022

Best-of-Both-Worlds Algorithms for Partial Monitoring

This paper considers the partial monitoring problem with k-actions and d...
research
02/02/2020

A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

We study small-loss bounds for the adversarial multi-armed bandits probl...
research
06/16/2022

Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback

The problem of online learning with graph feedback has been extensively ...
research
05/30/2022

Improved Algorithms for Bandit with Graph Feedback via Regret Decomposition

The problem of bandit with graph feedback generalizes both the multi-arm...
research
02/24/2023

Best-of-Three-Worlds Linear Bandit Algorithm with Variance-Adaptive Regret Bounds

This paper proposes a linear bandit algorithm that is adaptive to enviro...
research
04/01/2018

Online learning with graph-structured feedback against adaptive adversaries

We derive upper and lower bounds for the policy regret of T-round online...

Please sign up or login with your details

Forgot password? Click here to reset