Rate-adaptive model selection over a collection of black-box contextual bandit algorithms

06/05/2020
by   Aurélien F. Bibaut, et al.
5

We consider the model selection task in the stochastic contextual bandit setting. Suppose we are given a collection of base contextual bandit algorithms. We provide a master algorithm that combines them and achieves the same performance, up to constants, as the best base algorithm would, if it had been run on its own. Our approach only requires that each algorithm satisfy a high probability regret bound. Our procedure is very simple and essentially does the following: for a well chosen sequence of probabilities (p_t)_t≥ 1, at each round t, it either chooses at random which candidate to follow (with probability p_t) or compares, at the same internal sample size for each candidate, the cumulative reward of each, and selects the one that wins the comparison (with probability 1-p_t). To the best of our knowledge, our proposal is the first one to be rate-adaptive for a collection of general black-box contextual bandit algorithms: it achieves the same regret rate as the best candidate. We demonstrate the effectiveness of our method with simulation studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2020

Model Selection in Contextual Stochastic Bandit Problems

We study model selection in stochastic bandit problems. Our approach rel...
research
06/29/2022

Best of Both Worlds Model Selection

We study the problem of model selection in bandit scenarios in the prese...
research
07/07/2021

Model Selection for Generic Contextual Bandits

We consider the problem of model selection for the general stochastic co...
research
06/09/2020

Regret Balancing for Bandit and RL Model Selection

We consider model selection in stochastic bandit and reinforcement learn...
research
12/24/2020

Regret Bound Balancing and Elimination for Model Selection in Bandits and RL

We propose a simple model selection approach for algorithms in stochasti...
research
05/08/2018

Multinomial Logit Bandit with Linear Utility Functions

Multinomial logit bandit is a sequential subset selection problem which ...
research
05/22/2022

Fast Instrument Learning with Faster Rates

We investigate nonlinear instrumental variable (IV) regression given hig...

Please sign up or login with your details

Forgot password? Click here to reset