New Insights into Bootstrapping for Bandits

05/24/2018
by   Sharan Vaswani, et al.
0

We investigate the use of bootstrapping in the bandit setting. We first show that the commonly used non-parametric bootstrapping (NPB) procedure can be provably inefficient and establish a near-linear lower bound on the regret incurred by it under the bandit model with Bernoulli rewards. We show that NPB with an appropriate amount of forced exploration can result in sub-linear albeit sub-optimal regret. As an alternative to NPB, we propose a weighted bootstrapping (WB) procedure. For Bernoulli rewards, WB with multiplicative exponential weights is mathematically equivalent to Thompson sampling (TS) and results in near-optimal regret bounds. Similarly, in the bandit setting with Gaussian rewards, we show that WB with additive Gaussian weights achieves near-optimal regret. Beyond these special cases, we show that WB leads to better empirical performance than TS for several reward distributions bounded on [0,1]. For the contextual bandit setting, we give practical guidelines that make bootstrapping simple and efficient to implement and result in good empirical performance on real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2023

Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards

In this work, we study the performance of the Thompson Sampling algorith...
research
07/07/2020

Stochastic Linear Bandits Robust to Adversarial Attacks

We consider a stochastic linear bandit problem in which the rewards are ...
research
10/07/2020

Effects of Model Misspecification on Bayesian Bandits: Case Studies in UX Optimization

Bayesian bandits using Thompson Sampling have seen increasing success in...
research
12/13/2021

Risk and optimal policies in bandit experiments

This paper provides a decision theoretic analysis of bandit experiments....
research
06/07/2021

On Learning to Rank Long Sequences with Contextual Bandits

Motivated by problems of learning to rank long item sequences, we introd...
research
09/02/2023

Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits

We consider the adversarial linear contextual bandit problem, where the ...
research
12/31/2015

Selecting Near-Optimal Learners via Incremental Data Allocation

We study a novel machine learning (ML) problem setting of sequentially a...

Please sign up or login with your details

Forgot password? Click here to reset