Truncated LinUCB for Stochastic Linear Bandits

02/23/2022
by   Yanglei Song, et al.
0

This paper considers contextual bandits with a finite number of arms, where the contexts are independent and identically distributed d-dimensional random vectors, and the expected rewards are linear in both the arm parameters and contexts. The LinUCB algorithm, which is near minimax optimal for related linear bandits, is shown to have a cumulative regret that is suboptimal in both the dimension d and time horizon T, due to its over-exploration. A truncated version of LinUCB is proposed and termed "Tr-LinUCB", which follows LinUCB up to a truncation time S and performs pure exploitation afterwards. The Tr-LinUCB algorithm is shown to achieve O(dlog(T)) regret if S = Cdlog(T) for a sufficiently large constant C, and a matching lower bound is established, which shows the rate optimality of Tr-LinUCB in both d and T under a low dimensional regime. Further, if S = dlog^κ(T) for some κ>1, the loss compared to the optimal is a multiplicative loglog(T) factor, which does not depend on d. This insensitivity to overshooting in choosing the truncation time of Tr-LinUCB is of practical importance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2015

Simple regret for infinitely many armed bandits

We consider a stochastic bandit problem with infinitely many arms. In th...
research
03/23/2021

Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits

We derive improved regret bounds for the Tsallis-INF algorithm of Zimmer...
research
06/11/2022

Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

We propose a novel algorithm for linear contextual bandits with O(√(dT l...
research
03/05/2020

Stochastic Linear Contextual Bandits with Diverse Contexts

In this paper, we investigate the impact of context diversity on stochas...
research
11/08/2020

High-Dimensional Sparse Linear Bandits

Stochastic linear bandits with high-dimensional sparse features are a pr...
research
07/23/2022

Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

We present a non-asymptotic lower bound on the eigenspectrum of the desi...
research
10/24/2022

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

We study the problem of representation learning in stochastic contextual...

Please sign up or login with your details

Forgot password? Click here to reset