Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits

06/04/2020

∙

We consider the problem of model selection for two popular stochastic linear bandit settings, and propose algorithms that adapts to the unknown problem complexity. In the first setting, we consider the K armed mixture bandits, where the mean reward of arm i ∈ [K], is μ_i+ 〈α_i,t,θ^* 〉, with α_i,t∈R^d being the known context vector and μ_i ∈ [-1,1] and θ^* are unknown parameters. We define θ^* as the problem complexity and consider a sequence of nested hypothesis classes, each positing a different upper bound on θ^*. Exploiting this, we propose Adaptive Linear Bandit (ALB), a novel phase based algorithm that adapts to the true problem complexity, θ^*. We show that ALB achieves regret scaling of O(θ^*√(T)), where θ^* is apriori unknown. As a corollary, when θ^*=0, ALB recovers the minimax regret for the simple bandit algorithm without such knowledge of θ^*. ALB is the first algorithm that uses parameter norm as model section criteria for linear bandits. Prior state of art algorithms <cit.> achieve a regret of O(L√(T)), where L is the upper bound on θ^*, fed as an input to the problem. In the second setting, we consider the standard linear bandit problem (with possibly an infinite number of arms) where the sparsity of θ^*, denoted by d^* ≤ d, is unknown to the algorithm. Defining d^* as the problem complexity, we show that ALB achieves O(d^*√(T)) regret, matching that of an oracle who knew the true sparsity level. This is the first algorithm that achieves such model selection guarantees resolving an open problem in <cit.>. We further verify through synthetic and real-data experiments that the performance gains are fundamental and not artifacts of mathematical bounds.

READ FULL TEXT

Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits

Sign in with Google

Consider DeepAI Pro