Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

02/26/2018
by   Carlos Riquelme, et al.
0

Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, developing practical methods to balance exploration and exploitation in complex domains remains largely unsolved. Thompson Sampling and its extension to reinforcement learning provide an elegant approach to exploration that only requires access to posterior samples of the model. At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical. Thus, it is attractive to consider approximate Bayesian neural networks in a Thompson Sampling framework. To understand the impact of using an approximate posterior on Thompson Sampling, we benchmark well-established and recently developed methods for approximate posterior sampling combined with Thompson Sampling over a series of contextual bandit problems. We found that many approaches that have been successful in the supervised learning setting underperformed in the sequential decision-making scenario. In particular, we highlight the challenge of adapting slowly converging uncertainty estimates to the online setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2019

Thompson Sampling via Local Uncertainty

Thompson sampling is an efficient algorithm for sequential decision maki...
research
02/13/2018

Efficient Exploration through Bayesian Deep Q-Networks

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling...
research
10/06/2021

Residual Overfit Method of Exploration

Exploration is a crucial aspect of bandit and reinforcement learning alg...
research
05/25/2018

Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic Programming

We design a new myopic strategy for a wide class of sequential design of...
research
07/18/2021

GuideBoot: Guided Bootstrap for Deep Contextual Bandits

The exploration/exploitation (E E) dilemma lies at the core of interac...
research
11/04/2021

Infinite Time Horizon Safety of Bayesian Neural Networks

Bayesian neural networks (BNNs) place distributions over the weights of ...
research
08/14/2019

Thompson Sampling and Approximate Inference

We study the effects of approximate inference on the performance of Thom...

Please sign up or login with your details

Forgot password? Click here to reset