Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

01/02/2019
by   Chicheng Zhang, et al.
6

We investigate the feasibility of learning from both fully-labeled supervised data and contextual bandit data. We specifically consider settings in which the underlying learning signal may be different between these two data sources. Theoretically, we state and prove no-regret algorithms for learning that is robust to divergences between the two sources. Empirically, we evaluate some of these algorithms on a large selection of datasets, showing that our approaches are feasible, and helpful in practice.

READ FULL TEXT
research
08/05/2017

Efficient Contextual Bandits in Non-stationary Worlds

Most contextual bandit algorithms minimize regret to the best fixed poli...
research
07/05/2018

Contextual Bandits under Delayed Feedback

Delayed feedback is an ubiquitous problem in many industrial systems emp...
research
02/07/2023

Leveraging User-Triggered Supervision in Contextual Bandits

We study contextual bandit (CB) problems, where the user can sometimes r...
research
02/17/2023

Graph Feedback via Reduction to Regression

When feedback is partial, leveraging all available information is critic...
research
06/16/2022

A Contextual Combinatorial Semi-Bandit Approach to Network Bottleneck Identification

Bottleneck identification is a challenging task in network analysis, esp...
research
05/22/2022

Contextual Information-Directed Sampling

Information-directed sampling (IDS) has recently demonstrated its potent...
research
03/01/2023

Efficient Explorative Key-term Selection Strategies for Conversational Contextual Bandits

Conversational contextual bandits elicit user preferences by occasionall...

Please sign up or login with your details

Forgot password? Click here to reset