Log In Sign Up

Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

by   Debangshu Banerjee, et al.

We present a non-asymptotic lower bound on the eigenspectrum of the design matrix generated by any linear bandit algorithm with sub-linear regret when the action set has well-behaved curvature. Specifically, we show that the minimum eigenvalue of the expected design matrix grows as Ω(√(n)) whenever the expected cumulative regret of the algorithm is O(√(n)), where n is the learning horizon, and the action-space has a constant Hessian around the optimal arm. This shows that such action-spaces force a polynomial lower bound rather than a logarithmic lower bound, as shown by <cit.>, in discrete (i.e., well-separated) action spaces. Furthermore, while the previous result is shown to hold only in the asymptotic regime (as n →∞), our result for these “locally rich" action spaces is any-time. Additionally, under a mild technical assumption, we obtain a similar lower bound on the minimum eigen value holding with high probability. We apply our result to two practical scenarios – model selection and clustering in linear bandits. For model selection, we show that an epoch-based linear bandit algorithm adapts to the true model complexity at a rate exponential in the number of epochs, by virtue of our novel spectral bound. For clustering, we consider a multi agent framework where we show, by leveraging the spectral result, that no forced exploration is necessary – the agents can run a linear bandit algorithm and estimate their underlying parameters at once, and hence incur a low regret.


page 1

page 2

page 3

page 4


Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

We study the linear contextual bandit problem with finite action sets. W...

Pareto Optimal Model Selection in Linear Bandits

We study a model selection problem in the linear bandit setting, where t...

Smooth Bandit Optimization: Generalization to Hölder Space

We consider bandit optimization of a smooth reward function, where the g...

Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits

We consider the problem of model selection for two popular stochastic li...

Truncated LinUCB for Stochastic Linear Bandits

This paper considers contextual bandits with a finite number of arms, wh...

Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection

We consider the stochastic contextual bandit problem under the high dime...

Quick Best Action Identification in Linear Bandit Problems

In this paper, we consider a best action identification problem in the s...