Perturbed-History Exploration in Stochastic Multi-Armed Bandits

02/26/2019
by   Branislav Kveton, et al.
4

We propose an online algorithm for cumulative regret minimization in a stochastic multi-armed bandit. The algorithm adds O(t) i.i.d. pseudo-rewards to its history in round t and then pulls the arm with the highest estimated value in its perturbed history. Therefore, we call it perturbed-history exploration (PHE). The pseudo-rewards are designed to offset the underestimated values of arms in round t with a sufficiently high probability. We analyze PHE in a K-armed bandit and prove a O(K Δ^-1 n) bound on its n-round regret, where Δ is the minimum gap between the expected rewards of the optimal and suboptimal arms. The key to our analysis is a novel argument that shows that randomized Bernoulli rewards lead to optimism. We compare PHE empirically to several baselines and show that it is competitive with the best of them.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2018

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

We propose a multi-armed bandit algorithm that explores based on randomi...
research
03/21/2019

Perturbed-History Exploration in Stochastic Linear Bandits

We propose a new online algorithm for minimizing the cumulative regret i...
research
06/10/2015

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

This work addresses the problem of regret minimization in non-stochastic...
research
04/01/2022

Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk

We investigate a natural but surprisingly unstudied approach to the mult...
research
01/30/2020

HAMLET – A Learning Curve-Enabled Multi-Armed Bandit for Algorithm Selection

Automated algorithm selection and hyperparameter tuning facilitates the ...
research
12/13/2021

Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits

We consider the upper confidence bound strategy for Gaussian multi-armed...
research
02/15/2023

Bandit Social Learning: Exploration under Myopic Behavior

We study social learning dynamics where the agents collectively follow a...

Please sign up or login with your details

Forgot password? Click here to reset