Provably and Practically Efficient Neural Contextual Bandits

by   Sudeep Salgia, et al.

We consider the neural contextual bandit problem. In contrast to the existing work which primarily focuses on ReLU neural nets, we consider a general set of smooth activation functions. Under this more general setting, (i) we derive non-asymptotic error bounds on the difference between an overparameterized neural net and its corresponding neural tangent kernel, (ii) we propose an algorithm with a provably sublinear regret bound that is also efficient in the finite regime as demonstrated by empirical studies. The non-asymptotic error bounds may be of broader interest as a tool to establish the relation between the smoothness of the activation functions in neural contextual bandits and the smoothness of the kernels in kernel bandits.


page 1

page 2

page 3

page 4


Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

We study a nonparametric contextual bandit problem where the expected re...

Neural Contextual Bandits without Regret

Contextual bandits are a rich model for sequential decision making given...

Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces

Designing efficient general-purpose contextual bandit algorithms that wo...

Kernel ε-Greedy for Contextual Bandits

We consider a kernelized version of the ϵ-greedy strategy for contextual...

Efficient Contextual Bandits with Knapsacks via Regression

We consider contextual bandits with knapsacks (CBwK), a variant of the c...

An Empirical Study of Neural Kernel Bandits

Neural bandits have enabled practitioners to operate efficiently on prob...

Practical Contextual Bandits with Regression Oracles

A major challenge in contextual bandits is to design general-purpose alg...

Please sign up or login with your details

Forgot password? Click here to reset