Log In Sign Up

Neural Contextual Bandits with Upper Confidence Bound-Based Exploration

by   Dongruo Zhou, et al.

We study the stochastic contextual bandit problem, where the reward is generated from an unknown bounded function with additive noise. We propose the NeuralUCB algorithm, which leverages the representation power of deep neural networks and uses a neural network-based random feature mapping to construct an upper confidence bound (UCB) of reward for efficient exploration. We prove that, under mild assumptions, NeuralUCB achieves Õ(√(T)) regret, where T is the number of rounds. To the best of our knowledge, our algorithm is the first neural network-based contextual bandit algorithm with near-optimal regret guarantee. Preliminary experiment results on synthetic data corroborate our theory, and shed light on potential applications of our algorithm to real-world problems.


page 1

page 2

page 3

page 4


Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit

Balancing exploration and exploitation (EE) is a fundamental problem in ...

Neural Contextual Bandits with Deep Representation and Shallow Exploration

We study a general class of contextual bandits, where each context-actio...

Learning Contextual Bandits Through Perturbed Rewards

Thanks to the power of representation learning, neural contextual bandit...

Federated Neural Bandit

Recent works on neural contextual bandit have achieved compelling perfor...

Linear Contextual Bandits with Adversarial Corruptions

We study the linear contextual bandit problem in the presence of adversa...

Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits

We consider an adversarial variant of the classic K-armed linear context...

Contextual Bandits with Stochastic Experts

We consider the problem of contextual bandits with stochastic experts, w...

Code Repositories


Study NeuralUCB and regret analysis for contextual bandit with neural decision

view repo