A Note on Information-Directed Sampling and Thompson Sampling

03/24/2015
by   Li Zhou, et al.
0

This note introduce three Bayesian style Multi-armed bandit algorithms: Information-directed sampling, Thompson Sampling and Generalized Thompson Sampling. The goal is to give an intuitive explanation for these three algorithms and their regret bounds, and provide some derivations that are omitted in the original papers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2022

PAC-Bayesian Lifelong Learning For Multi-Armed Bandits

We present a PAC-Bayesian analysis of lifelong learning. In the lifelong...
research
06/02/2015

Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

We discuss a multiple-play multi-armed bandit (MAB) problem in which sev...
research
05/23/2018

Analysis of Thompson Sampling for Graphical Bandits Without the Graphs

We study multi-armed bandit problems with graph feedback, in which the d...
research
08/18/2011

Doing Better Than UCT: Rational Monte Carlo Sampling in Trees

UCT, a state-of-the art algorithm for Monte Carlo tree sampling (MCTS), ...
research
08/16/2017

Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors

Thompson sampling has impressive empirical performance for many multi-ar...
research
07/01/2015

Bootstrapped Thompson Sampling and Deep Exploration

This technical note presents a new approach to carrying out the kind of ...
research
08/09/2018

A note on partial rejection sampling for the hard disks model in the plane

In this note, we slightly improve the guarantees obtained by Guo and Jer...

Please sign up or login with your details

Forgot password? Click here to reset