Model selection in the context of bandit optimization is a challenging
p...
We study the problem of conservative off-policy evaluation (COPE) where ...
Existing generalization bounds fail to explain crucial factors that driv...
In practical applications, machine learning algorithms are often repeate...
We consider the bandit optimization problem with the reward function def...
Obtaining reliable, adaptive confidence sets for prediction functions
(h...
Contextual bandits are a rich model for sequential decision making given...