
Near Instance Optimal Model Selection for Pure Exploration Linear Bandits
The model selection problem in the pure exploration linear bandit settin...
read it

Model selection for contextual bandits
We introduce the problem of model selection for contextual bandits, wher...
read it

Open Problem: Model Selection for Contextual Bandits
In statistical learning, algorithms for model selection allow the learne...
read it

ProblemComplexity Adaptive Model Selection for Stochastic Linear Bandits
We consider the problem of model selection for two popular stochastic li...
read it

Optimal Model Selection in Contextual Bandits with Many Classes via Offline Oracles
We study the problem of model selection for contextual bandits, in which...
read it

Conservative Exploration for SemiBandits with Linear Generalization: A Product Selection Problem for Urban Warehouses
The recent rising popularity of ultrafast delivery services on retail p...
read it

InstanceWise MinimaxOptimal Algorithms for Logistic Bandits
Logistic Bandits have recently attracted substantial attention, by provi...
read it
Pareto Optimal Model Selection in Linear Bandits
We study a model selection problem in the linear bandit setting, where the learner must adapt to the dimension of the optimal hypothesis class on the fly and balance exploration and exploitation. More specifically, we assume a sequence of nested linear hypothesis classes with dimensions d_1 < d_2 < …, and the goal is to automatically adapt to the smallest hypothesis class that contains the true linear model. Although previous papers provide various guarantees for this model selection problem, the analysis therein either works in favorable cases when one can cheaply conduct statistical testing to locate the right hypothesis class or is based on the idea of "corralling" multiple base algorithms which often performs relatively poorly in practice. These works also mainly focus on upper bounding the regret. In this paper, we first establish a lower bound showing that, even with a fixed action set, adaptation to the unknown intrinsic dimension d_⋆ comes at a cost: there is no algorithm that can achieve the regret bound O(√(d_⋆ T)) simultaneously for all values of d_⋆. We also bring new ideas, i.e., constructing virtual mixturearms to effectively summarize useful information, into the model selection problem in linear bandits. Under a mild assumption on the action set, we design a Pareto optimal algorithm with guarantees matching the rate in the lower bound. Experimental results confirm our theoretical results and show advantages of our algorithm compared to prior work.
READ FULL TEXT
Comments
There are no comments yet.