Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

02/12/2020
by   Dylan J. Foster, et al.
9

A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

Adapting to Misspecification in Contextual Bandits

A major research direction in contextual bandits is to develop algorithm...
research
03/03/2018

Practical Contextual Bandits with Regression Oracles

A major challenge in contextual bandits is to design general-purpose alg...
research
07/05/2021

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

A recurring theme in statistical learning, online learning, and beyond i...
research
02/17/2023

Graph Feedback via Reduction to Regression

When feedback is partial, leveraging all available information is critic...
research
10/08/2020

Online and Distribution-Free Robustness: Regression and Contextual Bandits with Huber Contamination

In this work we revisit two classic high-dimensional online learning pro...
research
06/10/2020

Efficient Contextual Bandits with Continuous Actions

We create a computationally tractable algorithm for contextual bandits w...
research
03/05/2020

Generalized Policy Elimination: an efficient algorithm for Nonparametric Contextual Bandits

We propose the Generalized Policy Elimination (GPE) algorithm, an oracle...

Please sign up or login with your details

Forgot password? Click here to reset