On the Prior Sensitivity of Thompson Sampling

06/10/2015
by   Che-Yu Liu, et al.
0

The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently encoded as a prior distribution to balance exploration and exploitation more effectively. While it is generally believed that the algorithm's regret is low (high) when the prior is good (bad), little is known about the exact dependence. In this paper, we fully characterize the algorithm's worst-case dependence of regret on the choice of prior, focusing on a special yet representative case. These results also provide insights into the general sensitivity of the algorithm to the choice of priors. In particular, with p being the prior probability mass of the true reward-generating model, we prove O(√(T/p)) and O(√((1-p)T)) regret upper bounds for the bad- and good-prior cases, respectively, as well as matching lower bounds. Our proofs rely on the discovery of a fundamental property of Thompson Sampling and make heavy use of martingale theory, both of which appear novel in the literature, to the best of our knowledge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2013

Prior-free and prior-dependent regret bounds for Thompson Sampling

We consider the stochastic multi-armed bandit problem with a prior distr...
research
05/24/2016

Refined Lower Bounds for Adversarial Bandits

We provide new lower bounds on the regret that must be suffered by adver...
research
10/30/2022

Revisiting Simple Regret Minimization in Multi-Armed Bandits

Simple regret is a natural and parameter-free performance criterion for ...
research
02/03/2020

Sample Complexity of Incentivized Exploration

We consider incentivized exploration: a version of multi-armed bandits w...
research
02/11/2013

Adaptive-treed bandits

We describe a novel algorithm for noisy global optimisation and continuu...
research
11/11/2022

Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits

We consider the stochastic linear contextual bandit problem with high-di...
research
05/19/2015

Risk and Regret of Hierarchical Bayesian Learners

Common statistical practice has shown that the full power of Bayesian me...

Please sign up or login with your details

Forgot password? Click here to reset