First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

05/01/2023
by   Julia Olkhovskaya, et al.
0

We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of K arms to change over time without restriction. Assuming the d-dimensional contexts are drawn from a fixed known distribution, the worst-case expected regret over the course of T rounds is known to scale as Õ(√(Kd T)). Under the additional assumption that the density of the contexts is log-concave, we obtain a second-order bound of order Õ(K√(d V_T)) in terms of the cumulative second moment of the learner's losses V_T, and a closely related first-order bound of order Õ(K√(d L_T^*)) in terms of the cumulative loss of the best policy L_T^*. Since V_T or L_T^* may be significantly smaller than T, these improve over the worst-case regret whenever the environment is relatively benign. Our results are obtained using a truncated version of the continuous exponential weights algorithm over the probability simplex, which we analyse by exploiting a novel connection to the linear bandit setting without contexts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2020

Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits

We consider an adversarial variant of the classic K-armed linear context...
research
03/05/2020

Stochastic Linear Contextual Bandits with Diverse Contexts

In this paper, we investigate the impact of context diversity on stochas...
research
03/09/2023

Improved Regret Bounds for Online Kernel Selection under Bandit Feedback

In this paper, we improve the regret bound for online kernel selection u...
research
10/12/2020

Adapting to Delays and Data in Adversarial Multi-Armed Bandits

We consider the adversarial multi-armed bandit problem under delayed fee...
research
06/07/2020

Thompson Sampling for Multinomial Logit Contextual Bandits

We consider a dynamic assortment selection problem where the goal is to ...
research
02/08/2019

Bandit Principal Component Analysis

We consider a partial-feedback variant of the well-studied online PCA pr...
research
07/16/2020

Self-Tuning Bandits over Unknown Covariate-Shifts

Bandits with covariates, a.k.a. contextual bandits, address situations w...

Please sign up or login with your details

Forgot password? Click here to reset