The Pareto Frontier of model selection for general Contextual Bandits

10/25/2021
by   Teodor V. Marinov, et al.
0

Recent progress in model selection raises the question of the fundamental limits of these techniques. Under specific scrutiny has been model selection for general contextual bandits with nested policy classes, resulting in a COLT2020 open problem. It asks whether it is possible to obtain simultaneously the optimal single algorithm guarantees over all policies in a nested sequence of policy classes, or if otherwise this is possible for a trade-off α∈[1/2,1) between complexity term and time: ln(|Π_m|)^1-αT^α. We give a disappointing answer to this question. Even in the purely stochastic regime, the desired results are unobtainable. We present a Pareto frontier of up to logarithmic factors matching upper and lower bounds, thereby proving that an increase in the complexity term ln(|Π_m|) independent of T is unavoidable for general policy classes. As a side result, we also resolve a COLT2016 open problem concerning second-order bounds in full-information games.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2020

Open Problem: Model Selection for Contextual Bandits

In statistical learning, algorithms for model selection allow the learne...
research
06/11/2021

Optimal Model Selection in Contextual Bandits with Many Classes via Offline Oracles

We study the problem of model selection for contextual bandits, in which...
research
02/19/2023

Estimating Optimal Policy Value in General Linear Contextual Bandits

In many bandit problems, the maximal reward achievable by a policy is of...
research
06/04/2020

Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits

We consider the problem of model selection for two popular stochastic li...
research
04/05/2022

Pareto-optimal clustering with the primal deterministic information bottleneck

At the heart of both lossy compression and clustering is a trade-off bet...
research
11/08/2021

Universal and data-adaptive algorithms for model selection in linear contextual bandits

Model selection in contextual bandits is an important complementary prob...
research
07/05/2021

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

A recurring theme in statistical learning, online learning, and beyond i...

Please sign up or login with your details

Forgot password? Click here to reset