Efficient Algorithms for Extreme Bandits

03/21/2022
by   Dorian Baudry, et al.
3

In this paper, we contribute to the Extreme Bandit problem, a variant of Multi-Armed Bandits in which the learner seeks to collect the largest possible reward. We first study the concentration of the maximum of i.i.d random variables under mild assumptions on the tail of the rewards distributions. This analysis motivates the introduction of Quantile of Maxima (QoMax). The properties of QoMax are sufficient to build an Explore-Then-Commit (ETC) strategy, QoMax-ETC, achieving strong asymptotic guarantees despite its simplicity. We then propose and analyze a more adaptive, anytime algorithm, QoMax-SDA, which combines QoMax with a subsampling method recently introduced by Baudry et al. (2021). Both algorithms are more efficient than existing approaches in two aspects (1) they lead to better empirical performance (2) they enjoy a significant reduction of the memory and time complexities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2019

Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards

We study a multi-armed bandit problem with covariates in a setting where...
research
02/19/2021

Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs

We present a new type of acquisition functions for online decision makin...
research
10/22/2020

Quantile Bandits for Best Arms Identification with Concentration Inequalities

We consider a variant of the best arm identification task in stochastic ...
research
08/11/2022

Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of Multi-armed Bandits

The multi-armed bandit (MAB) model is one of the most classical models t...
research
09/09/2021

Extreme Bandits using Robust Statistics

We consider a multi-armed bandit problem motivated by situations where o...
research
12/01/2022

AC-Band: A Combinatorial Bandit-Based Approach to Algorithm Configuration

We study the algorithm configuration (AC) problem, in which one seeks to...

Please sign up or login with your details

Forgot password? Click here to reset