Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

03/30/2019
by   Yingkai Li, et al.
0

We study the linear contextual bandit problem with finite action sets. When the problem dimension is d, the time horizon is T, and there are n ≤ 2^d/2 candidate actions per time period, we (1) show that the minimax expected regret is Ω(√(dT T n)) for every algorithm, and (2) introduce a Variable-Confidence-Level (VCL) SupLinUCB algorithm whose regret matches the lower bound up to iterated logarithmic factors. Our algorithmic result saves two √( T) factors from previous analysis, and our information-theoretical lower bound also improves previous results by one √( T) factor, revealing a regret scaling quite different from classical multi-armed bandits in which no logarithmic T term is present in minimax regret. Our proof techniques include variable confidence levels and a careful analysis of layer sizes of SupLinUCB on the upper bound side, and delicately constructed adversarial sequences showing the tightness of elliptical potential lemmas on the lower bound side.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2019

Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

Linear contextual bandit is a class of sequential decision making proble...
research
03/29/2016

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

I introduce and analyse an anytime version of the Optimally Confident UC...
research
05/24/2023

On the Minimax Regret for Online Learning with Feedback Graphs

In this work, we improve on the upper and lower bounds for the regret of...
research
01/09/2023

On the Minimax Regret for Linear Bandits in a wide variety of Action Spaces

As noted in the works of <cit.>, it has been mentioned that it is an ope...
research
04/28/2020

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

In this paper, we study the problem of stochastic linear bandits with fi...
research
10/15/2021

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

We study the optimal batch-regret tradeoff for batch linear contextual b...
research
02/14/2011

Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems

The analysis of online least squares estimation is at the heart of many ...

Please sign up or login with your details

Forgot password? Click here to reset