Contextual Bandits under Delayed Feedback

07/05/2018
by   Claire Vernade, et al.
0

Delayed feedback is an ubiquitous problem in many industrial systems employing bandit algorithms. Most of those systems seek to optimize binary indicators as clicks. In that case, when the reward is not sent immediately, the learner cannot distinguish a negative signal from a not-yet-sent positive one: she might be waiting for a feedback that will never come. In this paper, we define and address the contextual bandit problem with delayed and censored feedback by providing a new UCB-based algorithm. In order to demonstrate its effectiveness, we provide a finite time regret analysis and an empirical evaluation that compares it against a baseline commonly used in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2019

Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

We investigate the feasibility of learning from both fully-labeled super...
research
11/16/2022

Dueling Bandits: From Two-dueling to Multi-dueling

We study a general multi-dueling bandit problem, where an agent compares...
research
02/17/2023

Graph Feedback via Reduction to Regression

When feedback is partial, leveraging all available information is critic...
research
02/20/2020

Regret Minimization in Stochastic Contextual Dueling Bandits

We consider the problem of stochastic K-armed dueling bandit in the cont...
research
02/01/2022

Regret Minimization with Performative Feedback

In performative prediction, the deployment of a predictive model trigger...
research
02/07/2023

Leveraging User-Triggered Supervision in Contextual Bandits

We study contextual bandit (CB) problems, where the user can sometimes r...
research
06/05/2020

Learning Multiclass Classifier Under Noisy Bandit Feedback

This paper addresses the problem of multiclass classification with corru...

Please sign up or login with your details

Forgot password? Click here to reset