Self-Tuning Bandits over Unknown Covariate-Shifts

07/16/2020
by   Joseph Suk, et al.
0

Bandits with covariates, a.k.a. contextual bandits, address situations where optimal actions (or arms) at a given time t, depend on a context x_t, e.g., a new patient's medical history, a consumer's past purchases. While it is understood that the distribution of contexts might change over time, e.g., due to seasonalities, or deployment to new environments, the bulk of studies concern the most adversarial such changes, resulting in regret bounds that are often worst-case in nature. Covariate-shift on the other hand has been considered in classification as a middle-ground formalism that can capture mild to relatively severe changes in distributions. We consider nonparametric bandits under such middle-ground scenarios, and derive new regret bounds that tightly capture a continuum of changes in context distribution. Furthermore, we show that these rates can be adaptively attained without knowledge of the time of shift nor the amount of shift.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/27/2021

Tracking Most Severe Arm Changes in Bandits

In bandits with distribution shifts, one aims to automatically detect an...
research
02/01/2020

Advances in Bandits with Knapsacks

"Bandits with Knapsacks" () is a general model for multi-armed bandits u...
research
07/11/2023

Tracking Most Significant Shifts in Nonparametric Contextual Bandits

We study nonparametric contextual bandits where Lipschitz mean reward fu...
research
04/10/2022

Worst-case Performance of Greedy Policies in Bandits with Imperfect Context Observations

Contextual bandits are canonical models for sequential decision-making u...
research
05/01/2023

First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

We consider the adversarial linear contextual bandit setting, which allo...
research
11/13/2019

Triply Robust Off-Policy Evaluation

We propose a robust regression approach to off-policy evaluation (OPE) f...
research
06/01/2021

Invariant Policy Learning: A Causal Perspective

In the past decade, contextual bandit and reinforcement learning algorit...

Please sign up or login with your details

Forgot password? Click here to reset