Leveraging User-Triggered Supervision in Contextual Bandits

02/07/2023
by   Alekh Agarwal, et al.
0

We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context. Such an interaction arises, for example, in text prediction or autocompletion settings, where a poor suggestion is simply ignored and the user enters the desired text instead. Crucially, this extra feedback is user-triggered on only a subset of the contexts. We develop a new framework to leverage such signals, while being robust to their biased nature. We also augment standard CB algorithms to leverage the signal, and show improved regret guarantees for the resulting algorithms under a variety of conditions on the helpfulness of and bias inherent in this feedback.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2020

Survey Bandits with Regret Guarantees

We consider a variant of the contextual bandit problem. In standard cont...
research
01/02/2019

Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

We investigate the feasibility of learning from both fully-labeled super...
research
10/29/2018

Heteroscedastic Bandits with Reneging

Although shown to be useful in many areas as models for solving sequenti...
research
07/05/2018

Contextual Bandits under Delayed Feedback

Delayed feedback is an ubiquitous problem in many industrial systems emp...
research
01/05/2021

Sequential Choice Bandits with Feedback for Personalizing users' experience

In this work, we study sequential choice bandits with feedback. We propo...
research
02/17/2023

Graph Feedback via Reduction to Regression

When feedback is partial, leveraging all available information is critic...
research
09/04/2019

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

Text-rich heterogeneous information networks (text-rich HINs) are ubiqui...

Please sign up or login with your details

Forgot password? Click here to reset