Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

11/13/2018
by   Branislav Kveton, et al.
6

We propose a multi-armed bandit algorithm that explores based on randomizing its history. The key idea is to estimate the value of the arm from the bootstrap sample of its history, where we add pseudo observations after each pull of the arm. The pseudo observations seem to be harmful. But on the contrary, they guarantee that the bootstrap sample is optimistic with a high probability. Because of this, we call our algorithm Giro, which is an abbreviation for garbage in, reward out. We analyze Giro in a K-armed Bernoulli bandit and prove a O(K Δ^-1 n) bound on its n-round regret, where Δ denotes the difference in the expected rewards of the optimal and best suboptimal arms. The main advantage of our exploration strategy is that it can be applied to any reward function generalization, such as neural networks. We evaluate Giro and its contextual variant on multiple synthetic and real-world problems, and observe that Giro is comparable to or better than state-of-the-art algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2019

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

We propose an online algorithm for cumulative regret minimization in a s...
research
03/07/2021

CORe: Capitalizing On Rewards in Bandit Exploration

We propose a bandit algorithm that explores purely by randomizing its pa...
research
02/17/2020

Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack

The multi-armed bandit formalism has been extensively studied under vari...
research
02/03/2023

Multiplier Bootstrap-based Exploration

Despite the great interest in the bandit problem, designing efficient al...
research
12/16/2022

Materials Discovery using Max K-Armed Bandit

Search algorithms for the bandit problems are applicable in materials di...
research
02/19/2020

Residual Bootstrap Exploration for Bandit Algorithms

In this paper, we propose a novel perturbation-based exploration method ...
research
03/21/2019

Perturbed-History Exploration in Stochastic Linear Bandits

We propose a new online algorithm for minimizing the cumulative regret i...

Please sign up or login with your details

Forgot password? Click here to reset