Efficient Optimal Learning for Contextual Bandits

06/13/2011
by   Miroslav Dudík, et al.
0

We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle and has a running time polylog(N), where N is the number of classification rules among which the oracle might choose. This is exponentially faster than all previous algorithms that achieve optimal regret in this setting. Our formulation also enables us to create an algorithm with regret that is additive rather than multiplicative in feedback delay as in all previous work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2023

Context-lumpable stochastic bandits

We consider a contextual bandit problem with S contexts and A actions. I...
research
02/20/2015

Contextual Semibandits via Supervised Learning Oracles

We study an online decision making problem where on each round a learner...
research
02/04/2014

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

We present a new algorithm for the contextual bandit learning problem, w...
research
06/09/2021

Contextual Recommendations and Low-Regret Cutting-Plane Algorithms

We consider the following variant of contextual linear bandits motivated...
research
07/24/2023

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

We consider the problem of contextual bandits and imitation learning, wh...
research
02/06/2016

BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits

We present efficient algorithms for the problem of contextual bandits wi...
research
06/06/2019

Stochastic Bandits with Context Distributions

We introduce a novel stochastic contextual bandit model, where at each s...

Please sign up or login with your details

Forgot password? Click here to reset