Pareto Optimal Model Selection in Linear Bandits

02/12/2021

∙

We study a model selection problem in the linear bandit setting, where the learner must adapt to the dimension of the optimal hypothesis class on the fly and balance exploration and exploitation. More specifically, we assume a sequence of nested linear hypothesis classes with dimensions d_1 < d_2 < …, and the goal is to automatically adapt to the smallest hypothesis class that contains the true linear model. Although previous papers provide various guarantees for this model selection problem, the analysis therein either works in favorable cases when one can cheaply conduct statistical testing to locate the right hypothesis class or is based on the idea of "corralling" multiple base algorithms which often performs relatively poorly in practice. These works also mainly focus on upper bounding the regret. In this paper, we first establish a lower bound showing that, even with a fixed action set, adaptation to the unknown intrinsic dimension d_⋆ comes at a cost: there is no algorithm that can achieve the regret bound O(√(d_⋆ T)) simultaneously for all values of d_⋆. We also bring new ideas, i.e., constructing virtual mixture-arms to effectively summarize useful information, into the model selection problem in linear bandits. Under a mild assumption on the action set, we design a Pareto optimal algorithm with guarantees matching the rate in the lower bound. Experimental results confirm our theoretical results and show advantages of our algorithm compared to prior work.

READ FULL TEXT

Pareto Optimal Model Selection in Linear Bandits

Sign in with Google

Consider DeepAI Pro