Log In Sign Up

Residual Bootstrap Exploration for Stochastic Linear Bandit

by   Shuang Wu, et al.

We propose a new bootstrap-based online algorithm for stochastic linear bandit problems. The key idea is to adopt residual bootstrap exploration, in which the agent estimates the next step reward by re-sampling the residuals of mean reward estimate. Our algorithm, residual bootstrap exploration for stochastic linear bandit (), estimates the linear reward from its re-sampling distribution and pulls the arm with the highest reward estimate. In particular, we contribute a theoretical framework to demystify residual bootstrap-based exploration mechanisms in stochastic linear bandit problems. The key insight is that the strength of bootstrap exploration is based on collaborated optimism between the online-learned model and the re-sampling distribution of residuals. Such observation enables us to show that the proposed secure a high-probability Õ(d √(n)) sub-linear regret under mild conditions. Our experiments support the easy generalizability of the principle in the various formulations of linear bandit problems and show the significant computational efficiency of .


page 1

page 2

page 3

page 4


Thompson sampling with the online bootstrap

Thompson sampling provides a solution to bandit problems in which new ob...

Linear Bandit algorithms using the Bootstrap

This study presents two new algorithms for solving linear stochastic ban...

Debiasing Samples from Online Learning Using Bootstrap

It has been recently shown in the literature that the sample averages fr...

Practical Evaluation and Optimization of Contextual Bandit Algorithms

We study and empirically optimize contextual bandit learning, exploratio...

A Nonparametric Contextual Bandit with Arm-level Eligibility Control for Customer Service Routing

Amazon Customer Service provides real-time support for millions of custo...

Residual Bootstrap Exploration for Bandit Algorithms

In this paper, we propose a novel perturbation-based exploration method ...

Perturbed-History Exploration in Stochastic Linear Bandits

We propose a new online algorithm for minimizing the cumulative regret i...