Thompson Sampling via Local Uncertainty

10/30/2019
by   Zhendong Wang, et al.
0

Thompson sampling is an efficient algorithm for sequential decision making, which exploits the posterior uncertainty to solve the exploration-exploitation dilemma. There has been significant recent interest in integrating Bayesian neural networks into Thompson sampling. Most of these methods rely on global variable uncertainty for exploration. In this paper, we propose a new probabilistic modeling framework for Thompson sampling, where local latent variable uncertainty is used to sample the mean reward. Variational inference is used to approximate the posterior of the local variable, and semi-implicit structure is further introduced to enhance its expressiveness. Our experimental results on eight contextual bandits benchmark datasets show that Thompson sampling guided by local uncertainty achieves state-of-the-arts performance while having low computational complexity.

READ FULL TEXT
research
02/26/2018

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

Recent advances in deep reinforcement learning have made significant str...
research
07/19/2023

VITS : Variational Inference Thomson Sampling for contextual bandits

In this paper, we introduce and analyze a variant of the Thompson sampli...
research
04/05/2018

Variational Rejection Sampling

Learning latent variable models with stochastic variational inference is...
research
03/15/2021

Sampling-free Variational Inference for Neural Networks with Multiplicative Activation Noise

To adopt neural networks in safety critical domains, knowing whether we ...
research
06/30/2023

Thompson sampling for improved exploration in GFlowNets

Generative flow networks (GFlowNets) are amortized variational inference...
research
12/30/2017

Learning Structural Weight Uncertainty for Sequential Decision-Making

Learning probability distributions on the weights of neural networks (NN...
research
07/18/2021

GuideBoot: Guided Bootstrap for Deep Contextual Bandits

The exploration/exploitation (E E) dilemma lies at the core of interac...

Please sign up or login with your details

Forgot password? Click here to reset