Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

05/04/2019
by   Yingkai Li, et al.
0

Linear contextual bandit is a class of sequential decision making problems with important applications in recommendation systems, online advertising, healthcare, and other machine learning related tasks. While there is much prior research, tight regret bounds of linear contextual bandit with infinite action sets remain open. In this paper, we prove regret upper bound of O(√(d^2T T))×poly( T) where d is the domain dimension and T is the time horizon. Our upper bound matches the previous lower bound of Ω(√(d^2 T T)) up to iterated logarithmic terms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2019

Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

We study the linear contextual bandit problem with finite action sets. W...
research
09/29/2020

Online Action Learning in High Dimensions: A New Exploration Rule for Contextual ε_t-Greedy Heuristics

Bandit problems are pervasive in various fields of research and are also...
research
03/17/2021

Homomorphically Encrypted Linear Contextual Bandit

Contextual bandit is a general framework for online learning in sequenti...
research
11/16/2022

Dynamical Linear Bandits

In many real-world sequential decision-making problems, an action does n...
research
02/14/2023

Effective Dimension in Bandit Problems under Censorship

In this paper, we study both multi-armed and contextual bandit problems ...
research
01/31/2019

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model

Contextual multi-armed bandit (MAB) algorithms have been shown promising...
research
11/19/2016

Conservative Contextual Linear Bandits

Safety is a desirable property that can immensely increase the applicabi...

Please sign up or login with your details

Forgot password? Click here to reset