Upper Confidence Bounds for Combining Stochastic Bandits

12/24/2020
by   Ashok Cutkosky, et al.
0

We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of N individual bandit algorithms as arms in a higher-level N-armed bandit problem that we solve with a variant of the classic UCB algorithm. Our final regret depends only on the regret of the base algorithm with the best regret in hindsight. This approach provides an easy and intuitive alternative strategy to the CORRAL algorithm for adversarial bandits, without requiring the stability conditions imposed by CORRAL on the base algorithms. Our results match lower bounds in several settings, and we provide empirical validation of our algorithm on misspecified linear bandit and model selection problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2020

Corralling Stochastic Bandit Algorithms

We study the problem of corralling stochastic bandit algorithms, that is...
research
01/08/2020

On Thompson Sampling for Smoother-than-Lipschitz Bandits

Thompson Sampling is a well established approach to bandit and reinforce...
research
04/04/2019

Empirical Bayes Regret Minimization

The prevalent approach to bandit algorithm design is to have a low-regre...
research
04/28/2019

Periodic Bandits and Wireless Network Selection

Bandit-style algorithms have been studied extensively in stochastic and ...
research
09/30/2022

Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

While standard bandit algorithms sometimes incur high regret, their perf...
research
03/03/2020

Model Selection in Contextual Stochastic Bandit Problems

We study model selection in stochastic bandit problems. Our approach rel...
research
06/18/2019

Simple Algorithms for Dueling Bandits

In this paper, we present simple algorithms for Dueling Bandits. We prov...

Please sign up or login with your details

Forgot password? Click here to reset