DeepAI AI Chat
Log In Sign Up

Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

by   Yingkai Li, et al.

We study the linear contextual bandit problem with finite action sets. When the problem dimension is d, the time horizon is T, and there are n ≤ 2^d/2 candidate actions per time period, we (1) show that the minimax expected regret is Ω(√(dT T n)) for every algorithm, and (2) introduce a Variable-Confidence-Level (VCL) SupLinUCB algorithm whose regret matches the lower bound up to iterated logarithmic factors. Our algorithmic result saves two √( T) factors from previous analysis, and our information-theoretical lower bound also improves previous results by one √( T) factor, revealing a regret scaling quite different from classical multi-armed bandits in which no logarithmic T term is present in minimax regret. Our proof techniques include variable confidence levels and a careful analysis of layer sizes of SupLinUCB on the upper bound side, and delicately constructed adversarial sequences showing the tightness of elliptical potential lemmas on the lower bound side.


page 1

page 2

page 3

page 4


Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

Linear contextual bandit is a class of sequential decision making proble...

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

I introduce and analyse an anytime version of the Optimally Confident UC...

On the Minimax Regret for Online Learning with Feedback Graphs

In this work, we improve on the upper and lower bounds for the regret of...

On the Minimax Regret for Linear Bandits in a wide variety of Action Spaces

As noted in the works of <cit.>, it has been mentioned that it is an ope...

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

In this paper, we study the problem of stochastic linear bandits with fi...

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

We study the optimal batch-regret tradeoff for batch linear contextual b...

Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems

The analysis of online least squares estimation is at the heart of many ...