Universal and data-adaptive algorithms for model selection in linear contextual bandits

11/08/2021
by   Vidya Muthukumar, et al.
0

Model selection in contextual bandits is an important complementary problem to regret minimization with respect to a fixed model class. We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem. Even in this instance, current state-of-the-art methods explore in a suboptimal manner and require strong "feature-diversity" conditions. In this paper, we introduce new algorithms that a) explore in a data-adaptive manner, and b) provide model selection guarantees of the form 𝒪(d^α T^1- α) with no feature diversity conditions whatsoever, where d denotes the dimension of the linear model and T denotes the total number of rounds. The first algorithm enjoys a "best-of-both-worlds" property, recovering two prior results that hold under distinct distributional assumptions, simultaneously. The second removes distributional assumptions altogether, expanding the scope for tractable model selection. Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2020

Open Problem: Model Selection for Contextual Bandits

In statistical learning, algorithms for model selection allow the learne...
research
07/07/2021

Model Selection for Generic Contextual Bandits

We consider the problem of model selection for the general stochastic co...
research
06/11/2021

Optimal Model Selection in Contextual Bandits with Many Classes via Offline Oracles

We study the problem of model selection for contextual bandits, in which...
research
10/25/2020

Tractable contextual bandits beyond realizability

Tractable contextual bandit algorithms often rely on the realizability a...
research
11/09/2017

Action Centered Contextual Bandits

Contextual bandits have become popular as they offer a middle ground bet...
research
10/08/2020

Online and Distribution-Free Robustness: Regression and Contextual Bandits with Huber Contamination

In this work we revisit two classic high-dimensional online learning pro...
research
10/25/2021

The Pareto Frontier of model selection for general Contextual Bandits

Recent progress in model selection raises the question of the fundamenta...

Please sign up or login with your details

Forgot password? Click here to reset