Thompson Sampling and Approximate Inference

08/14/2019
by   My Phan, et al.
0

We study the effects of approximate inference on the performance of Thompson sampling in the k-armed bandit problems. Thompson sampling is a successful algorithm for online decision-making but requires posterior inference, which often must be approximated in practice. We show that even small constant inference error (in α-divergence) can lead to poor performance (linear regret) due to under-exploration (for α<1) or over-exploration (for α>0) by the approximation. While for α > 0 this is unavoidable, for α≤ 0 the regret can be improved by adding a small amount of forced exploration even when the inference error is a large constant.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2022

Generalized Bayesian Upper Confidence Bound with Approximate Inference for Bandit Problems

Bayesian bandit algorithms with approximate inference have been widely u...
research
03/27/2017

Thompson Sampling for Linear-Quadratic Control Problems

We consider the exploration-exploitation tradeoff in linear quadratic (L...
research
02/23/2020

On Thompson Sampling with Langevin Algorithms

Thompson sampling is a methodology for multi-armed bandit problems that ...
research
02/26/2018

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

Recent advances in deep reinforcement learning have made significant str...
research
10/30/2021

Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling

Using bandit algorithms to conduct adaptive randomised experiments can m...
research
06/30/2023

Thompson sampling for improved exploration in GFlowNets

Generative flow networks (GFlowNets) are amortized variational inference...
research
11/05/2021

Maillard Sampling: Boltzmann Exploration Done Optimally

The PhD thesis of Maillard (2013) presents a randomized algorithm for th...

Please sign up or login with your details

Forgot password? Click here to reset