Generalizing Hierarchical Bayesian Bandits

05/30/2022
by   Imad Aouali, et al.
0

A contextual bandit is a popular and practical framework for online learning to act under uncertainty. In many problems, the number of actions is huge and their mean rewards are correlated. In this work, we introduce a general framework for capturing such correlations through a two-level graphical model where actions are related through multiple shared latent parameters. We propose a Thompson sampling algorithm G-HierTS that uses this structure to explore efficiently and bound its Bayes regret. The regret has two terms, one for learning action parameters and the other for learning the shared latent parameters. The terms reflect the structure of our model as well as the quality of priors. Our theoretical findings are validated empirically using both synthetic and real-world problems. We also experiment with G-HierTS that maintains a factored posterior over latent parameters. While this approximation does not come with guarantees, it improves computational efficiency with a minimal impact on empirical regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2022

Deep Hierarchy in Bandits

Mean rewards of actions are often correlated. The form of these correlat...
research
07/09/2020

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

We propose a novel framework for structured bandits, which we call an in...
research
06/15/2020

Latent Bandits Revisited

A latent bandit problem is one in which the learning agent knows the arm...
research
04/25/2016

Double Thompson Sampling for Dueling Bandits

In this paper, we propose a Double Thompson Sampling (D-TS) algorithm fo...
research
06/10/2021

Thompson Sampling with a Mixture Prior

We study Thompson sampling (TS) in online decision-making problems where...
research
02/16/2023

Linear Bandits with Memory: from Rotting to Rising

Nonstationary phenomena, such as satiation effects in recommendation, ar...
research
06/03/2018

Conservative Exploration using Interleaving

In many practical problems, a learning agent may want to learn the best ...

Please sign up or login with your details

Forgot password? Click here to reset