An improved regret analysis for UCB-N and TS-N

05/06/2023
by   Nishant A. Mehta, et al.
0

In the setting of stochastic online learning with undirected feedback graphs, Lykouris et al. (2020) previously analyzed the pseudo-regret of the upper confidence bound-based algorithm UCB-N and the Thompson Sampling-based algorithm TS-N. In this note, we show how to improve their pseudo-regret analysis. Our improvement involves refining a key lemma of the previous analysis, allowing a log(T) factor to be replaced by a factor log_2(α) + 3 for α the independence number of the feedback graph.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

On the Minimax Regret for Online Learning with Feedback Graphs

In this work, we improve on the upper and lower bounds for the regret of...
research
06/01/2022

A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

We consider online learning with feedback graphs, a sequential decision-...
research
05/23/2018

Analysis of Thompson Sampling for Graphical Bandits Without the Graphs

We study multi-armed bandit problems with graph feedback, in which the d...
research
07/20/2021

Best-of-All-Worlds Bounds for Online Learning with Feedback Graphs

We study the online learning with feedback graphs framework introduced b...
research
09/10/2019

Optimality of the Subgradient Algorithm in the Stochastic Setting

Recently Jaouad Mourtada and Stéphane Gaïffas showed the anytime hedge a...
research
05/25/2023

Fast Online Node Labeling for Very Large Graphs

This paper studies the online node classification problem under a transd...
research
01/16/2017

Thompson Sampling For Stochastic Bandits with Graph Feedback

We present a novel extension of Thompson Sampling for stochastic sequent...

Please sign up or login with your details

Forgot password? Click here to reset