On Thompson Sampling with Langevin Algorithms

02/23/2020
by   Eric Mazumdar, et al.
9

Thompson sampling is a methodology for multi-armed bandit problems that is known to enjoy favorable performance in both theory and practice. It does, however, have a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue. We construct quickly converging Langevin algorithms to generate approximate samples that have accuracy guarantees, and we leverage novel posterior concentration rates to analyze the regret of the resulting approximate Thompson sampling algorithm. Further, we specify the necessary hyper-parameters for the MCMC procedure to guarantee optimal instance-dependent frequentist regret while having low computational complexity. In particular, our algorithms take advantage of both posterior concentration and a sample reuse mechanism to ensure that only a constant number of iterations and a constant amount of data is needed in each round. The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2019

A Multi-armed Bandit MCMC, with applications in sampling from doubly intractable posterior

Markov chain Monte Carlo (MCMC) algorithms are widely used to sample fro...
research
08/14/2019

Thompson Sampling and Approximate Inference

We study the effects of approximate inference on the performance of Thom...
research
07/29/2021

Efficiently resolving rotational ambiguity in Bayesian matrix sampling with matching

A wide class of Bayesian models involve unidentifiable random matrices t...
research
09/12/2023

Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors

Thompson sampling (TS) is one of the most popular and earliest algorithm...
research
06/22/2022

Langevin Monte Carlo for Contextual Bandits

We study the efficiency of Thompson sampling for contextual bandits. Exi...
research
04/03/2022

Exact Privacy Guarantees for Markov Chain Implementations of the Exponential Mechanism with Artificial Atoms

Implementations of the exponential mechanism in differential privacy oft...
research
05/27/2021

Stochastic Gradient MCMC with Multi-Armed Bandit Tuning

Stochastic gradient Markov chain Monte Carlo (SGMCMC) is a popular class...

Please sign up or login with your details

Forgot password? Click here to reset