Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors

08/16/2017
by   Yichi Zhou, et al.
0

Thompson sampling has impressive empirical performance for many multi-armed bandit problems. But current algorithms for Thompson sampling only work for the case of conjugate priors since these algorithms require to infer the posterior, which is often computationally intractable when the prior is not conjugate. In this paper, we propose a novel algorithm for Thompson sampling which only requires to draw samples from a tractable distribution, so our algorithm is efficient even when the prior is non-conjugate. To do this, we reformulate Thompson sampling as an optimization problem via the Gumbel-Max trick. After that we construct a set of random variables and our goal is to identify the one with highest mean. Finally, we solve it with techniques in best arm identification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2013

Prior-free and prior-dependent regret bounds for Thompson Sampling

We consider the stochastic multi-armed bandit problem with a prior distr...
research
10/27/2020

Sub-sampling for Efficient Non-Parametric Bandit Exploration

In this paper we propose the first multi-armed bandit algorithm based on...
research
01/31/2017

Learning the distribution with largest mean: two bandit frameworks

Over the past few years, the multi-armed bandit model has become increas...
research
03/24/2015

A Note on Information-Directed Sampling and Thompson Sampling

This note introduce three Bayesian style Multi-armed bandit algorithms: ...
research
02/28/2021

Random tree Besov priors – Towards fractal imaging

We propose alternatives to Bayesian a priori distributions that are freq...
research
01/12/2023

Thompson Sampling with Diffusion Generative Prior

In this work, we initiate the idea of using denoising diffusion models t...
research
07/04/2023

Approximate information for efficient exploration-exploitation strategies

This paper addresses the exploration-exploitation dilemma inherent in de...

Please sign up or login with your details

Forgot password? Click here to reset