Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

07/09/2020
by   Tong Yu, et al.
0

We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act efficiently in our models. The key idea is to track a structured posterior distribution of model parameters, either exactly or approximately. To act, we sample model parameters from their posterior and then use the structure of the influence diagram to find the most optimistic action under the sampled parameters. We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-specific state-of-the-art baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2022

Generalizing Hierarchical Bayesian Bandits

A contextual bandit is a popular and practical framework for online lear...
research
02/26/2022

Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework

Online learning in large-scale structured bandits is known to be challen...
research
04/25/2016

Double Thompson Sampling for Dueling Bandits

In this paper, we propose a Double Thompson Sampling (D-TS) algorithm fo...
research
06/16/2022

A Contextual Combinatorial Semi-Bandit Approach to Network Bottleneck Identification

Bottleneck identification is a challenging task in network analysis, esp...
research
02/03/2022

Deep Hierarchy in Bandits

Mean rewards of actions are often correlated. The form of these correlat...
research
07/01/2021

A Map of Bandits for E-commerce

The rich body of Bandit literature not only offers a diverse toolbox of ...
research
12/01/2021

Efficient Online Bayesian Inference for Neural Bandits

In this paper we present a new algorithm for online (sequential) inferen...

Please sign up or login with your details

Forgot password? Click here to reset