TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation

06/11/2020
by   Jackie Baek, et al.
0

Thompson sampling has become a ubiquitous approach to online decision problems with bandit feedback. The key algorithmic task for Thompson sampling is drawing a sample from the posterior of the optimal action. We propose an alternative arm selection rule we dub TS-UCB, that requires negligible additional computational effort but provides significant performance improvements relative to Thompson sampling. At each step, TS-UCB computes a score for each arm using two ingredients: posterior sample(s) and upper confidence bounds. TS-UCB can be used in any setting where these two quantities are available, and it is flexible in the number of posterior samples it takes as input. This proves particularly valuable in heuristics for deep contextual bandits: we show that TS-UCB achieves materially lower regret on all problem instances in a deep bandit suite proposed in Riquelme et al. (2018). Finally, from a theoretical perspective, we establish optimal regret guarantees for TS-UCB for both the K-armed and linear bandit models.

READ FULL TEXT
research
11/08/2022

Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms

In this paper, we address the stochastic contextual linear bandit proble...
research
09/15/2012

Further Optimal Regret Bounds for Thompson Sampling

Thompson Sampling is one of the oldest heuristics for multi-armed bandit...
research
05/28/2021

Asymptotically Optimal Bandits under Weighted Information

We study the problem of regret minimization in a multi-armed bandit setu...
research
10/15/2014

Thompson sampling with the online bootstrap

Thompson sampling provides a solution to bandit problems in which new ob...
research
03/07/2018

Satisficing in Time-Sensitive Bandit Learning

Much of the recent literature on bandit learning focuses on algorithms t...
research
12/03/2018

Thompson Sampling for Noncompliant Bandits

Thompson sampling, a Bayesian method for balancing exploration and explo...
research
10/23/2017

Sequential Matrix Completion

We propose a novel algorithm for sequential matrix completion in a recom...

Please sign up or login with your details

Forgot password? Click here to reset