Thompson Sampling for Contextual Bandits with Linear Payoffs

09/15/2012
by   Shipra Agrawal, et al.
0

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state-of-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we design and analyze a generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary. This is among the most important and widely studied versions of the contextual bandits problem. We provide the first theoretical guarantees for the contextual version of Thompson Sampling. We prove a high probability regret bound of Õ(d^3/2√(T)) (or Õ(d√(T (N)))), which is the best regret bound achieved by any computationally efficient algorithm available for this problem in the current literature, and is within a factor of √(d) (or √((N))) of the information-theoretic lower bound for this problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2012

Further Optimal Regret Bounds for Thompson Sampling

Thompson Sampling is one of the oldest heuristics for multi-armed bandit...
research
05/24/2019

OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits

We consider the stochastic linear (multi-armed) contextual bandit proble...
research
01/08/2020

On Thompson Sampling for Smoother-than-Lipschitz Bandits

Thompson Sampling is a well established approach to bandit and reinforce...
research
04/26/2022

Rate-Constrained Remote Contextual Bandits

We consider a rate-constrained contextual multi-armed bandit (RC-CMAB) p...
research
06/30/2015

Scalable Discrete Sampling as a Multi-Armed Bandit Problem

Drawing a sample from a discrete distribution is one of the building com...
research
06/30/2022

Ranking in Contextual Multi-Armed Bandits

We study a ranking problem in the contextual multi-armed bandit setting....
research
11/16/2020

Corrupted Contextual Bandits with Action Order Constraints

We consider a variant of the novel contextual bandit problem with corrup...

Please sign up or login with your details

Forgot password? Click here to reset