Information Directed Sampling for Stochastic Bandits with Graph Feedback

11/08/2017
by   Fang Liu, et al.
0

We consider stochastic multi-armed bandit problems with graph feedback, where the decision maker is allowed to observe the neighboring actions of the chosen action. We allow the graph structure to vary with time and consider both deterministic and Erdős-Rényi random graph models. For such a graph feedback model, we first present a novel analysis of Thompson sampling that leads to tighter performance bound than existing work. Next, we propose new Information Directed Sampling based policies that are graph-aware in their decision making. Under the deterministic graph case, we establish a Bayesian regret bound for the proposed policies that scales with the clique cover number of the graph instead of the number of actions. Under the random graph case, we provide a Bayesian regret bound for the proposed policies that scales with the ratio of the number of actions over the expected number of observations per iteration. To the best of our knowledge, this is the first analytical result for stochastic bandits with random graph feedback. Finally, using numerical evaluations, we demonstrate that our proposed IDS policies outperform existing approaches, including adaptions of upper confidence bound, ϵ-greedy and Exp3 algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2018

Analysis of Thompson Sampling for Graphical Bandits Without the Graphs

We study multi-armed bandit problems with graph feedback, in which the d...
research
05/23/2019

Graph regret bounds for Thompson Sampling and UCB

We study the stochastic multi-armed bandit problem with the graph-based ...
research
01/16/2017

Thompson Sampling For Stochastic Bandits with Graph Feedback

We present a novel extension of Thompson Sampling for stochastic sequent...
research
01/29/2018

Information Directed Sampling and Bandits with Heteroscedastic Noise

In the stochastic bandit problem, the goal is to maximize an unknown fun...
research
06/28/2019

Adaptive Sequential Experiments with Unknown Information Flows

Systems that make sequential decisions in the presence of partial feedba...
research
04/26/2017

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

We study the stochastic multi-armed bandit (MAB) problem in the presence...
research
10/13/2018

On Greedy and Strategic Evaders in Sequential Interdiction Settings with Incomplete Information

We consider a class of sequential interdiction settings where the interd...

Please sign up or login with your details

Forgot password? Click here to reset