Log In Sign Up

Asymptotic Convergence of Thompson Sampling

by   Cem Kalkanli, et al.

Thompson sampling has been shown to be an effective policy across a variety of online learning tasks. Many works have analyzed the finite time performance of Thompson sampling, and proved that it achieves a sub-linear regret under a broad range of probabilistic settings. However its asymptotic behavior remains mostly underexplored. In this paper, we prove an asymptotic convergence result for Thompson sampling under the assumption of a sub-linear Bayesian regret, and show that the actions of a Thompson sampling agent provide a strongly consistent estimator of the optimal action. Our results rely on the martingale structure inherent in Thompson sampling.


page 1

page 2

page 3

page 4


Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes

Reducing the variance of the gradient estimator is known to improve the ...

Asymptotically Optimal Information-Directed Sampling

We introduce a computationally efficient algorithm for finite stochastic...

Asymptotic Behavior of Bayesian Learners with Misspecified Models

We consider an agent who represents uncertainty about her environment vi...

Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality

We revisit the problem of stochastic online learning with feedback graph...

Implications of Regret on Stability of Linear Dynamical Systems

The setting of an agent making decisions under uncertainty and under dyn...

On the Suboptimality of Thompson Sampling in High Dimensions

In this paper we consider Thompson Sampling for combinatorial semi-bandi...

Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

We consider a data corruption scenario in the classical k Nearest Neighb...