Improved Algorithms for Bandit with Graph Feedback via Regret Decomposition

05/30/2022
by   Yuchen He, et al.
0

The problem of bandit with graph feedback generalizes both the multi-armed bandit (MAB) problem and the learning with expert advice problem by encoding in a directed graph how the loss vector can be observed in each round of the game. The mini-max regret is closely related to the structure of the feedback graph and their connection is far from being fully understood. We propose a new algorithmic framework for the problem based on a partition of the feedback graph. Our analysis reveals the interplay between various parts of the graph by decomposing the regret to the sum of the regret caused by small parts and the regret caused by their interaction. As a result, our algorithm can be viewed as an interpolation and generalization of the optimal algorithms for MAB and learning with expert advice. Our framework unifies previous algorithms for both strongly observable graphs and weakly observable graphs, resulting in improved and optimal regret bounds on a wide range of graph families including graphs of bounded degree and strongly observable graphs with a few corrupted arms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2023

On Interpolating Experts and Multi-Armed Bandits

Learning with expert advice and multi-armed bandit are two classic onlin...
research
06/02/2022

Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs

This study considers online learning with general directed feedback grap...
research
12/17/2020

Experts with Lower-Bounded Loss Feedback: A Unifying Framework

The most prominent feedback models for the best expert problem are the f...
research
02/02/2020

A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

We study small-loss bounds for the adversarial multi-armed bandits probl...
research
10/04/2022

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

We study high-probability regret bounds for adversarial K-armed bandits ...
research
05/29/2021

Understanding Bandits with Graph Feedback

The bandit problem with graph feedback, proposed in [Mannor and Shamir, ...
research
05/22/2023

Bandit Submodular Maximization for Multi-Robot Coordination in Unpredictable and Partially Observable Environments

We study the problem of multi-agent coordination in unpredictable and pa...

Please sign up or login with your details

Forgot password? Click here to reset