A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

06/01/2022
by   Chloé Rouyer, et al.
0

We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is Õ (√(α T)), where T is the time horizon and α is the independence number of the feedback graph. The bound against stochastic environments is O( (ln T)^2 max_S∈ℐ(G)∑_i ∈ SΔ_i^-1) where ℐ(G) is the family of all independent sets in a suitably defined undirected version of the graph and Δ_i are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2016

Online Learning with Feedback Graphs Without the Graphs

We study an online learning framework introduced by Mannor and Shamir (2...
research
06/16/2022

Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback

The problem of online learning with graph feedback has been extensively ...
research
10/09/2022

Learning on the Edge: Online Learning with Stochastic Feedback Graphs

The framework of feedback graphs is a generalization of sequential decis...
research
05/06/2023

An improved regret analysis for UCB-N and TS-N

In the setting of stochastic online learning with undirected feedback gr...
research
10/14/2020

Online Learning with Vector Costs and Bandits with Knapsacks

We introduce online learning with vector costs () where in each time ste...
research
06/07/2021

Beyond Bandit Feedback in Online Multiclass Classification

We study the problem of online multiclass classification in a setting wh...
research
10/11/2021

Online Graph Learning in Dynamic Environments

Inferring the underlying graph topology that characterizes structured da...

Please sign up or login with your details

Forgot password? Click here to reset