DeepAI
Log In Sign Up

Residual Bootstrap Exploration for Stochastic Linear Bandit

02/23/2022
by   Shuang Wu, et al.
1

We propose a new bootstrap-based online algorithm for stochastic linear bandit problems. The key idea is to adopt residual bootstrap exploration, in which the agent estimates the next step reward by re-sampling the residuals of mean reward estimate. Our algorithm, residual bootstrap exploration for stochastic linear bandit (), estimates the linear reward from its re-sampling distribution and pulls the arm with the highest reward estimate. In particular, we contribute a theoretical framework to demystify residual bootstrap-based exploration mechanisms in stochastic linear bandit problems. The key insight is that the strength of bootstrap exploration is based on collaborated optimism between the online-learned model and the re-sampling distribution of residuals. Such observation enables us to show that the proposed secure a high-probability Õ(d √(n)) sub-linear regret under mild conditions. Our experiments support the easy generalizability of the principle in the various formulations of linear bandit problems and show the significant computational efficiency of .

READ FULL TEXT

page 1

page 2

page 3

page 4

10/15/2014

Thompson sampling with the online bootstrap

Thompson sampling provides a solution to bandit problems in which new ob...
05/04/2016

Linear Bandit algorithms using the Bootstrap

This study presents two new algorithms for solving linear stochastic ban...
07/31/2021

Debiasing Samples from Online Learning Using Bootstrap

It has been recently shown in the literature that the sample averages fr...
02/12/2018

Practical Evaluation and Optimization of Contextual Bandit Algorithms

We study and empirically optimize contextual bandit learning, exploratio...
09/08/2022

A Nonparametric Contextual Bandit with Arm-level Eligibility Control for Customer Service Routing

Amazon Customer Service provides real-time support for millions of custo...
02/19/2020

Residual Bootstrap Exploration for Bandit Algorithms

In this paper, we propose a novel perturbation-based exploration method ...
03/21/2019

Perturbed-History Exploration in Stochastic Linear Bandits

We propose a new online algorithm for minimizing the cumulative regret i...