Meta-Thompson Sampling

02/11/2021
by   Branislav Kveton, et al.
0

Efficient exploration in multi-armed bandits is a fundamental online learning problem. In this work, we propose a variant of Thompson sampling that learns to explore better as it interacts with problem instances drawn from an unknown prior distribution. Our algorithm meta-learns the prior and thus we call it Meta-TS. We propose efficient implementations of Meta-TS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning the prior and is of a broader interest, because we derive the first prior-dependent upper bound on the Bayes regret of Thompson sampling. This result is complemented by empirical evaluation, which shows that Meta-TS quickly adapts to the unknown prior.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2022

Meta-Learning for Simple Regret Minimization

We develop a meta-learning framework for simple regret minimization in b...
research
05/31/2022

Online Meta-Learning in Adversarial Multi-Armed Bandits

We study meta-learning for adversarial multi-armed bandits. We consider ...
research
02/09/2021

Nonstochastic Bandits with Infinitely Many Experts

We study the problem of nonstochastic bandits with infinitely many exper...
research
07/03/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Thompson sampling and other Bayesian sequential decision-making algorith...
research
01/31/2022

Neural Collaborative Filtering Bandits via Meta Learning

Contextual multi-armed bandits provide powerful tools to solve the explo...
research
09/07/2019

AutoML for Contextual Bandits

Contextual Bandits is one of the widely popular techniques used in appli...
research
02/28/2019

Meta Dynamic Pricing: Learning Across Experiments

We study the problem of learning across a sequence of price experiments ...

Please sign up or login with your details

Forgot password? Click here to reset