Deep Hierarchy in Bandits

02/03/2022
by   Joey Hong, et al.
0

Mean rewards of actions are often correlated. The form of these correlations may be complex and unknown a priori, such as the preferences of a user for recommended products and their categories. To maximize statistical efficiency, it is important to leverage these correlations when learning. We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model with latent variables. Since the hierarchy can have multiple layers, we call it deep. We propose a hierarchical Thompson sampling algorithm (HierTS) for this problem, and show how to implement it efficiently for Gaussian hierarchies. The efficient implementation is possible due to a novel exact hierarchical representation of the posterior, which itself is of independent interest. We use this exact posterior to analyze the Bayes regret of HierTS in Gaussian bandits. Our analysis reflects the structure of the problem, that the regret decreases with the prior width, and also shows that hierarchies reduce the regret by non-constant factors in the number of actions. We confirm these theoretical findings empirically, in both synthetic and real-world experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2022

Generalizing Hierarchical Bayesian Bandits

A contextual bandit is a popular and practical framework for online lear...
research
11/12/2021

Hierarchical Bayesian Bandits

Meta-, multi-task, and federated learning can be all viewed as solving s...
research
06/15/2021

Thompson Sampling for Unimodal Bandits

In this paper, we propose a Thompson Sampling algorithm for unimodal ban...
research
06/16/2021

Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?

We study learning algorithms for the classical Markovian bandit problem ...
research
04/14/2020

Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards

In this paper, we consider the problem of sleeping bandits with stochast...
research
07/09/2020

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

We propose a novel framework for structured bandits, which we call an in...
research
02/08/2021

Correlated Bandits for Dynamic Pricing via the ARC algorithm

The Asymptotic Randomised Control (ARC) algorithm provides a rigorous ap...

Please sign up or login with your details

Forgot password? Click here to reset