Asymptotic Convergence of Thompson Sampling

11/08/2020
by   Cem Kalkanli, et al.
0

Thompson sampling has been shown to be an effective policy across a variety of online learning tasks. Many works have analyzed the finite time performance of Thompson sampling, and proved that it achieves a sub-linear regret under a broad range of probabilistic settings. However its asymptotic behavior remains mostly underexplored. In this paper, we prove an asymptotic convergence result for Thompson sampling under the assumption of a sub-linear Bayesian regret, and show that the actions of a Thompson sampling agent provide a strongly consistent estimator of the optimal action. Our results rely on the martingale structure inherent in Thompson sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2021

Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes

Reducing the variance of the gradient estimator is known to improve the ...
research
11/11/2020

Asymptotically Optimal Information-Directed Sampling

We introduce a computationally efficient algorithm for finite stochastic...
research
04/18/2019

Asymptotic Behavior of Bayesian Learners with Misspecified Models

We consider an agent who represents uncertainty about her environment vi...
research
06/20/2022

Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality

We revisit the problem of stochastic online learning with feedback graph...
research
02/10/2021

On the Suboptimality of Thompson Sampling in High Dimensions

In this paper we consider Thompson Sampling for combinatorial semi-bandi...
research
02/13/2020

Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

We consider a data corruption scenario in the classical k Nearest Neighb...
research
05/08/2018

Profitable Bandits

Originally motivated by default risk management applications, this paper...

Please sign up or login with your details

Forgot password? Click here to reset