DeepAI
Log In Sign Up

Neural Contextual Bandits with Upper Confidence Bound-Based Exploration

11/11/2019
by   Dongruo Zhou, et al.
16

We study the stochastic contextual bandit problem, where the reward is generated from an unknown bounded function with additive noise. We propose the NeuralUCB algorithm, which leverages the representation power of deep neural networks and uses a neural network-based random feature mapping to construct an upper confidence bound (UCB) of reward for efficient exploration. We prove that, under mild assumptions, NeuralUCB achieves Õ(√(T)) regret, where T is the number of rounds. To the best of our knowledge, our algorithm is the first neural network-based contextual bandit algorithm with near-optimal regret guarantee. Preliminary experiment results on synthetic data corroborate our theory, and shed light on potential applications of our algorithm to real-world problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/29/2021

Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit

Balancing exploration and exploitation (EE) is a fundamental problem in ...
12/03/2020

Neural Contextual Bandits with Deep Representation and Shallow Exploration

We study a general class of contextual bandits, where each context-actio...
01/24/2022

Learning Contextual Bandits Through Perturbed Rewards

Thanks to the power of representation learning, neural contextual bandit...
05/28/2022

Federated Neural Bandit

Recent works on neural contextual bandit have achieved compelling perfor...
10/25/2021

Linear Contextual Bandits with Adversarial Corruptions

We study the linear contextual bandit problem in the presence of adversa...
02/01/2020

Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits

We consider an adversarial variant of the classic K-armed linear context...
02/23/2018

Contextual Bandits with Stochastic Experts

We consider the problem of contextual bandits with stochastic experts, w...

Code Repositories

neural_exploration

Study NeuralUCB and regret analysis for contextual bandit with neural decision


view repo