Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

05/18/2012
by   Emilie Kaufmann, et al.
0

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2018

Asymptotically Optimal Multi-Armed Bandit Activation Policies under Side Constraints

This paper introduces the first asymptotically optimal strategy for the ...
research
06/02/2015

Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

We discuss a multiple-play multi-armed bandit (MAB) problem in which sev...
research
06/20/2022

Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality

We revisit the problem of stochastic online learning with feedback graph...
research
12/13/2021

Risk and optimal policies in bandit experiments

This paper provides a decision theoretic analysis of bandit experiments....
research
06/14/2022

On the Finite-Time Performance of the Knowledge Gradient Algorithm

The knowledge gradient (KG) algorithm is a popular and effective algorit...
research
05/08/2018

Profitable Bandits

Originally motivated by default risk management applications, this paper...
research
03/04/2020

Odds-Ratio Thompson Sampling to Control for Time-Varying Effect

Multi-armed bandit methods have been used for dynamic experiments partic...

Please sign up or login with your details

Forgot password? Click here to reset