-
A Multi-armed Bandit MCMC, with applications in sampling from doubly intractable posterior
Markov chain Monte Carlo (MCMC) algorithms are widely used to sample fro...
read it
-
Thompson Sampling and Approximate Inference
We study the effects of approximate inference on the performance of Thom...
read it
-
No Free Lunch for Approximate MCMC
It is widely known that the performance of Markov chain Monte Carlo (MCM...
read it
-
Neural Thompson Sampling
Thompson Sampling (TS) is one of the most effective algorithms for solvi...
read it
-
Approximation Methods for Kernelized Bandits
The RKHS bandit problem (also called kernelized multi-armed bandit probl...
read it
-
UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits
In this work, we address the open problem of finding low-complexity near...
read it
-
Unbiased Bayes for Big Data: Paths of Partial Posteriors
A key quantity of interest in Bayesian inference are expectations of fun...
read it
On Thompson Sampling with Langevin Algorithms
Thompson sampling is a methodology for multi-armed bandit problems that is known to enjoy favorable performance in both theory and practice. It does, however, have a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue. We construct quickly converging Langevin algorithms to generate approximate samples that have accuracy guarantees, and we leverage novel posterior concentration rates to analyze the regret of the resulting approximate Thompson sampling algorithm. Further, we specify the necessary hyper-parameters for the MCMC procedure to guarantee optimal instance-dependent frequentist regret while having low computational complexity. In particular, our algorithms take advantage of both posterior concentration and a sample reuse mechanism to ensure that only a constant number of iterations and a constant amount of data is needed in each round. The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.
READ FULL TEXT
Comments
There are no comments yet.