On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

03/16/2023
by   Weitong Zhang, et al.
1

We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level ζ>0. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level ζ is dominated by Õ (Δ / √(d)) with Δ being the minimal sub-optimality gap and d being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound Õ (d^2/Δ) as in the well-specified setting up to logarithmic factors. In addition, we show that an existing algorithm SupLinUCB (Chu et al., 2011) can also achieve a gap-dependent constant regret bound without the knowledge of sub-optimality gap Δ. Together with a lower bound adapted from Lattimore et al. (2020), our result suggests an interplay between misspecification level and the sub-optimality gap: (1) the linear contextual bandit model is efficiently learnable when ζ≤Õ(Δ / √(d)); and (2) it is not efficiently learnable when ζ≥Ω̃(Δ / √(d)). Experiments on both synthetic and real-world datasets corroborate our theoretical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2020

Q-learning with Logarithmic Regret

This paper presents the first non-asymptotic result showing that a model...
research
11/23/2020

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

Reinforcement learning (RL) with linear function approximation has recei...
research
02/28/2022

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

We consider learning a stochastic bandit model, where the reward functio...
research
09/10/2019

Optimality of the Subgradient Algorithm in the Stochastic Setting

Recently Jaouad Mourtada and Stéphane Gaïffas showed the anytime hedge a...
research
09/28/2018

Efficient Linear Bandits through Matrix Sketching

We prove that two popular linear contextual bandit algorithms, OFUL and ...
research
10/25/2021

Linear Contextual Bandits with Adversarial Corruptions

We study the linear contextual bandit problem in the presence of adversa...
research
02/21/2019

Certainty Equivalent Control of LQR is Efficient

We study the performance of the certainty equivalent controller on the L...

Please sign up or login with your details

Forgot password? Click here to reset