Linear Contextual Bandits with Adversarial Corruptions

10/25/2021
by   Heyang Zhao, et al.
5

We study the linear contextual bandit problem in the presence of adversarial corruption, where the interaction between the player and a possibly infinite decision set is contaminated by an adversary that can corrupt the reward up to a corruption level C measured by the sum of the largest alteration on rewards in each round. We present a variance-aware algorithm that is adaptive to the level of adversarial contamination C. The key algorithmic design includes (1) a multi-level partition scheme of the observed data, (2) a cascade of confidence sets that are adaptive to the level of the corruption, and (3) a variance-aware confidence set construction that can take advantage of low-variance reward. We further prove that the regret of the proposed algorithm is Õ(C^2d√(∑_t = 1^T σ_t^2) + C^2R√(dT)), where d is the dimension of context vectors, T is the number of rounds, R is the range of noise and σ_t^2,t=1…,T are the variances of instantaneous reward. We also prove a gap-dependent regret bound for the proposed algorithm, which is instance-dependent and thus leads to better performance on good practical instances. To the best of our knowledge, this is the first variance-aware corruption-robust algorithm for contextual bandits. Experiments on synthetic data corroborate our theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2022

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

We consider learning a stochastic bandit model, where the reward functio...
research
11/11/2019

Neural Contextual Bandits with Upper Confidence Bound-Based Exploration

We study the stochastic contextual bandit problem, where the reward is g...
research
01/29/2021

Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and Horizon-Free Bound for Linear Mixture MDP

We show how to construct variance-aware confidence sets for linear bandi...
research
03/16/2023

On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

We study linear contextual bandits in the misspecified setting, where th...
research
01/20/2021

Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

The contextual combinatorial semi-bandit problem with linear payoff func...
research
02/24/2023

Best-of-Three-Worlds Linear Bandit Algorithm with Variance-Adaptive Regret Bounds

This paper proposes a linear bandit algorithm that is adaptive to enviro...
research
06/01/2022

Contextual Bandits with Knapsacks for a Conversion Model

We consider contextual bandits with knapsacks, with an underlying struct...

Please sign up or login with your details

Forgot password? Click here to reset