Corralling Stochastic Bandit Algorithms

06/16/2020
by   Raman Arora, et al.
0

We study the problem of corralling stochastic bandit algorithms, that is combining multiple bandit algorithms designed for a stochastic environment, with the goal of devising a corralling algorithm that performs almost as well as the best base algorithm. We give two general algorithms for this setting, which we show benefit from favorable regret guarantees. We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward, and depends on the gap between the highest reward and other rewards. We also provide lower bounds for this problem that further justify our approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2020

Upper Confidence Bounds for Combining Stochastic Bandits

We provide a simple method to combine stochastic bandit algorithms. Our ...
research
11/25/2021

Bandit problems with fidelity rewards

The fidelity bandits problem is a variant of the K-armed bandit problem ...
research
08/12/2015

No Regret Bound for Extreme Bandits

Algorithms for hyperparameter optimization abound, all of which work wel...
research
03/09/2016

Best-of-K Bandits

This paper studies the Best-of-K Bandit game: At each time the player ch...
research
09/30/2022

Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

While standard bandit algorithms sometimes incur high regret, their perf...
research
03/07/2021

CORe: Capitalizing On Rewards in Bandit Exploration

We propose a bandit algorithm that explores purely by randomizing its pa...
research
06/03/2021

Bandit Phase Retrieval

We study a bandit version of phase retrieval where the learner chooses a...

Please sign up or login with your details

Forgot password? Click here to reset