Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

06/11/2022
by   Wonyoung Kim, et al.
0

We propose a novel algorithm for linear contextual bandits with O(√(dT log T)) regret bound, where d is the dimension of contexts and T is the time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization. Depending on the randomization, our proposed estimator takes contribution either from contexts of all arms or from selected contexts. We establish a self-normalized bound for our estimator, which allows a novel decomposition of the cumulative regret into additive dimension-dependent terms instead of multiplicative terms. We also prove a novel lower bound of Ω(√(dT)) under our problem setting. Hence, the regret of our proposed algorithm matches the lower bound up to logarithmic factors. The numerical experiments support the theoretical guarantees and show that our proposed method outperforms the existing linear bandit algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2022

Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits

We propose a novel contextual bandit algorithm for generalized linear re...
research
04/28/2020

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

In this paper, we study the problem of stochastic linear bandits with fi...
research
09/28/2018

Efficient Linear Bandits through Matrix Sketching

We prove that two popular linear contextual bandit algorithms, OFUL and ...
research
02/23/2022

Truncated LinUCB for Stochastic Linear Bandits

This paper considers contextual bandits with a finite number of arms, wh...
research
03/29/2022

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

We give novel algorithms for multi-task and lifelong linear bandits with...
research
01/31/2023

Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback

We consider the linear contextual multi-class multi-period packing probl...
research
06/08/2022

Learning in Distributed Contextual Linear Bandits Without Sharing the Context

Contextual linear bandits is a rich and theoretically important model th...

Please sign up or login with your details

Forgot password? Click here to reset