Online Learning with Feedback Graphs: The True Shape of Regret

06/05/2023
by   Tomáš Kocák, et al.
0

Sequential learning with feedback graphs is a natural extension of the multi-armed bandit problem where the problem is equipped with an underlying graph structure that provides additional information - playing an action reveals the losses of all the neighbors of the action. This problem was introduced by <cit.> and received considerable attention in recent years. It is generally stated in the literature that the minimax regret rate for this problem is of order √(α T), where α is the independence number of the graph, and T is the time horizon. However, this is proven only when the number of rounds T is larger than α^3, which poses a significant restriction for the usability of this result in large graphs. In this paper, we define a new quantity R^*, called the problem complexity, and prove that the minimax regret is proportional to R^* for any graph and time horizon T. Introducing an intricate exploration strategy, we define the algorithm that achieves the minimax optimal regret bound and becomes the first provably optimal algorithm for this setting, even if T is smaller than α^3.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2020

MOTS: Minimax Optimal Thompson Sampling

Thompson sampling is one of the most widely used algorithms for many onl...
research
07/29/2019

Bandits with Feedback Graphs and Switching Costs

We study the adversarial multi-armed bandit problem where partial observ...
research
07/14/2023

On Interpolating Experts and Multi-Armed Bandits

Learning with expert advice and multi-armed bandit are two classic onlin...
research
05/23/2018

Analysis of Thompson Sampling for Graphical Bandits Without the Graphs

We study multi-armed bandit problems with graph feedback, in which the d...
research
10/01/2021

Batched Thompson Sampling

We introduce a novel anytime Batched Thompson sampling policy for multi-...
research
06/16/2022

Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback

The problem of online learning with graph feedback has been extensively ...
research
10/19/2019

On Adaptivity in Information-constrained Online Learning

We study how to adapt to smoothly-varying (`easy') environments in well-...

Please sign up or login with your details

Forgot password? Click here to reset