Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability

03/28/2020
by   David Simchi-Levi, et al.
0

We consider the general (stochastic) contextual bandit problem under the realizability assumption, i.e., the expected reward, as a function of contexts and actions, belongs to a general function class F. We design a fast and simple algorithm that achieves the statistically optimal regret with only O(log T) calls to an offline least-squares regression oracle across all T rounds (the number of oracle calls can be further reduced to O(loglog T) if T is known in advance). Our algorithm provides the first universal and optimal reduction from contextual bandits to offline regression, solving an important open problem for the realizable setting of contextual bandits. Our algorithm is also the first provably optimal contextual bandit algorithm with a logarithmic number of oracle calls.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2014

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

We present a new algorithm for the contextual bandit learning problem, w...
research
10/21/2022

Optimal Contextual Bandits with Knapsacks under Realizibility via Regression Oracles

We study the stochastic contextual bandit with knapsacks (CBwK) problem,...
research
04/27/2015

Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

We study contextual bandits with budget and time constraints, referred t...
research
11/27/2021

Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization

Offline policy learning (OPL) leverages existing data collected a priori...
research
05/02/2023

Stochastic Contextual Bandits with Graph-based Contexts

We naturally generalize the on-line graph prediction problem to a versio...
research
11/27/2022

Counterfactual Optimism: Rate Optimal Regret for Stochastic Contextual MDPs

We present the UC^3RL algorithm for regret minimization in Stochastic Co...
research
09/07/2022

Dual Instrumental Method for Confounded Kernelized Bandits

The contextual bandit problem is a theoretically justified framework wit...

Please sign up or login with your details

Forgot password? Click here to reset