New Insights into Bootstrapping for Bandits

by   Sharan Vaswani, et al.
The University of British Columbia

We investigate the use of bootstrapping in the bandit setting. We first show that the commonly used non-parametric bootstrapping (NPB) procedure can be provably inefficient and establish a near-linear lower bound on the regret incurred by it under the bandit model with Bernoulli rewards. We show that NPB with an appropriate amount of forced exploration can result in sub-linear albeit sub-optimal regret. As an alternative to NPB, we propose a weighted bootstrapping (WB) procedure. For Bernoulli rewards, WB with multiplicative exponential weights is mathematically equivalent to Thompson sampling (TS) and results in near-optimal regret bounds. Similarly, in the bandit setting with Gaussian rewards, we show that WB with additive Gaussian weights achieves near-optimal regret. Beyond these special cases, we show that WB leads to better empirical performance than TS for several reward distributions bounded on [0,1]. For the contextual bandit setting, we give practical guidelines that make bootstrapping simple and efficient to implement and result in good empirical performance on real-world datasets.


page 1

page 2

page 3

page 4


Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards

In this work, we study the performance of the Thompson Sampling algorith...

Stochastic Linear Bandits Robust to Adversarial Attacks

We consider a stochastic linear bandit problem in which the rewards are ...

Effects of Model Misspecification on Bayesian Bandits: Case Studies in UX Optimization

Bayesian bandits using Thompson Sampling have seen increasing success in...

Risk and optimal policies in bandit experiments

This paper provides a decision theoretic analysis of bandit experiments....

On Learning to Rank Long Sequences with Contextual Bandits

Motivated by problems of learning to rank long item sequences, we introd...

Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits

We consider the adversarial linear contextual bandit problem, where the ...

Selecting Near-Optimal Learners via Incremental Data Allocation

We study a novel machine learning (ML) problem setting of sequentially a...

Please sign up or login with your details

Forgot password? Click here to reset