Analysis of Thompson Sampling for Graphical Bandits Without the Graphs

05/23/2018
by   Fang Liu, et al.
0

We study multi-armed bandit problems with graph feedback, in which the decision maker is allowed to observe the neighboring actions of the chosen action, in a setting where the graph may vary over time and is never fully revealed to the decision maker. We show that when the feedback graphs are undirected, the original Thompson Sampling achieves the optimal (within logarithmic factors) regret Õ(√(β_0(G)T)) over time horizon T, where β_0(G) is the average independence number of the latent graphs. To the best of our knowledge, this is the first result showing that the original Thompson Sampling is optimal for graphical bandits in the undirected setting. A slightly weaker regret bound of Thompson Sampling in the directed setting is also presented. To fill this gap, we propose a variant of Thompson Sampling, that attains the optimal regret in the directed setting within a logarithmic factor. Both algorithms can be implemented efficiently and do not require the knowledge of the feedback graphs at any time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2020

Adversarial Linear Contextual Bandits with Graph-Structured Side Observations

This paper studies the adversarial graphical contextual bandits, a varia...
research
11/08/2017

Information Directed Sampling for Stochastic Bandits with Graph Feedback

We consider stochastic multi-armed bandit problems with graph feedback, ...
research
07/17/2013

From Bandits to Experts: A Tale of Domination and Independence

We consider the partial observability model for multi-armed bandits, int...
research
03/24/2015

A Note on Information-Directed Sampling and Thompson Sampling

This note introduce three Bayesian style Multi-armed bandit algorithms: ...
research
05/06/2023

An improved regret analysis for UCB-N and TS-N

In the setting of stochastic online learning with undirected feedback gr...
research
10/09/2022

Learning on the Edge: Online Learning with Stochastic Feedback Graphs

The framework of feedback graphs is a generalization of sequential decis...
research
06/05/2023

Online Learning with Feedback Graphs: The True Shape of Regret

Sequential learning with feedback graphs is a natural extension of the m...

Please sign up or login with your details

Forgot password? Click here to reset