Linear Thompson Sampling Revisited

11/20/2016
by   Marc Abeille, et al.
0

We derive an alternative proof for the regret of Thompson sampling () in the stochastic linear bandit setting. While we obtain a regret bound of order O(d^3/2√(T)) as in previous results, the proof sheds new light on the functioning of the . We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to optimistic parameters does control it. Thus we show that can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional √(d) regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2017

Thompson Sampling for Linear-Quadratic Control Problems

We consider the exploration-exploitation tradeoff in linear quadratic (L...
research
02/12/2020

A General Framework to Analyze Stochastic Linear Bandit

In this paper we study the well-known stochastic linear bandit problem w...
research
05/05/2016

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

We study the K-armed dueling bandit problem, a variation of the standard...
research
06/07/2016

Regret Bounds for Non-decomposable Metrics with Missing Labels

We consider the problem of recommending relevant labels (items) for a gi...
research
06/21/2019

Randomized Exploration in Generalized Linear Bandits

We study two randomized algorithms for generalized linear bandits, GLM-T...
research
02/15/2018

Bandit Learning with Positive Externalities

Many platforms are characterized by the fact that future user arrivals a...
research
06/21/2019

Thompson Sampling for Adversarial Bit Prediction

We study the Thompson sampling algorithm in an adversarial setting, spec...

Please sign up or login with your details

Forgot password? Click here to reset