Contextual Semibandits via Supervised Learning Oracles

02/20/2015
by   Akshay Krishnamurthy, et al.
0

We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this feedback. These problems, known as contextual semibandits, arise in crowdsourcing, recommendation, and many other domains. This paper reduces contextual semibandits to supervised learning, allowing us to leverage powerful supervised learning methods in this partial-feedback setting. Our first reduction applies when the mapping from feedback to reward is known and leads to a computationally efficient algorithm with near-optimal regret. We show that this algorithm outperforms state-of-the-art approaches on real-world learning-to-rank datasets, demonstrating the advantage of oracle-based algorithms. Our second reduction applies to the previously unstudied setting when the linear mapping from feedback to reward is unknown. Our regret guarantees are superior to prior techniques that ignore the feedback.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2011

Efficient Optimal Learning for Contextual Bandits

We address the problem of learning in an online setting where the learne...
research
07/21/2022

Delayed Feedback in Generalised Linear Bandits Revisited

The stochastic generalised linear bandit is a well-understood model for ...
research
05/29/2021

On the Theory of Reinforcement Learning with Once-per-Episode Feedback

We study a theory of reinforcement learning (RL) in which the learner re...
research
09/07/2021

Learning to Bid in Contextual First Price Auctions

In this paper, we investigate the problem about how to bid in repeated c...
research
09/17/2020

Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward

We considered a novel practical problem of online learning with episodic...
research
03/03/2020

Contextual Search for General Hypothesis Classes

We study a general version of the problem of online learning under binar...
research
10/30/2015

CONQUER: Confusion Queried Online Bandit Learning

We present a new recommendation setting for picking out two items from a...

Please sign up or login with your details

Forgot password? Click here to reset