Bootstrapped Thompson Sampling and Deep Exploration

07/01/2015
by   Ian Osband, et al.
0

This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions. The approach is based on a bootstrap technique that uses a combination of observed and artificially generated data. The latter serves to induce a prior distribution which, as we will demonstrate, is critical to effective exploration. We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling. The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes computationally infeasible.

READ FULL TEXT
research
10/15/2014

Thompson sampling with the online bootstrap

Thompson sampling provides a solution to bandit problems in which new ob...
research
10/27/2020

Sub-sampling for Efficient Non-Parametric Bandit Exploration

In this paper we propose the first multi-armed bandit algorithm based on...
research
09/10/2017

Bayesian bandits: balancing the exploration-exploitation tradeoff via double sampling

Reinforcement learning studies how to balance exploration and exploitati...
research
03/24/2015

A Note on Information-Directed Sampling and Thompson Sampling

This note introduce three Bayesian style Multi-armed bandit algorithms: ...
research
06/29/2022

Cyclical Kernel Adaptive Metropolis

We propose cKAM, cyclical Kernel Adaptive Metropolis, which incorporates...
research
02/07/2021

State-Aware Variational Thompson Sampling for Deep Q-Networks

Thompson sampling is a well-known approach for balancing exploration and...
research
05/20/2017

Ensemble Sampling

Thompson sampling has emerged as an effective heuristic for a broad rang...

Please sign up or login with your details

Forgot password? Click here to reset