Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

07/23/2022
by   Debangshu Banerjee, et al.
0

We present a non-asymptotic lower bound on the eigenspectrum of the design matrix generated by any linear bandit algorithm with sub-linear regret when the action set has well-behaved curvature. Specifically, we show that the minimum eigenvalue of the expected design matrix grows as Ω(√(n)) whenever the expected cumulative regret of the algorithm is O(√(n)), where n is the learning horizon, and the action-space has a constant Hessian around the optimal arm. This shows that such action-spaces force a polynomial lower bound rather than a logarithmic lower bound, as shown by <cit.>, in discrete (i.e., well-separated) action spaces. Furthermore, while the previous result is shown to hold only in the asymptotic regime (as n →∞), our result for these “locally rich" action spaces is any-time. Additionally, under a mild technical assumption, we obtain a similar lower bound on the minimum eigen value holding with high probability. We apply our result to two practical scenarios – model selection and clustering in linear bandits. For model selection, we show that an epoch-based linear bandit algorithm adapts to the true model complexity at a rate exponential in the number of epochs, by virtue of our novel spectral bound. For clustering, we consider a multi agent framework where we show, by leveraging the spectral result, that no forced exploration is necessary – the agents can run a linear bandit algorithm and estimate their underlying parameters at once, and hence incur a low regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2023

On the Minimax Regret for Linear Bandits in a wide variety of Action Spaces

As noted in the works of <cit.>, it has been mentioned that it is an ope...
research
02/12/2021

Pareto Optimal Model Selection in Linear Bandits

We study a model selection problem in the linear bandit setting, where t...
research
06/04/2020

Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits

We consider the problem of model selection for two popular stochastic li...
research
06/03/2023

Incentivizing Exploration with Linear Contexts and Combinatorial Actions

We advance the study of incentivized bandit exploration, in which arm ch...
research
02/23/2022

Truncated LinUCB for Stochastic Linear Bandits

This paper considers contextual bandits with a finite number of arms, wh...
research
09/04/2020

Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection

We consider the stochastic contextual bandit problem under the high dime...
research
12/02/2018

Quick Best Action Identification in Linear Bandit Problems

In this paper, we consider a best action identification problem in the s...

Please sign up or login with your details

Forgot password? Click here to reset