Neural Contextual Bandits with Deep Representation and Shallow Exploration

12/03/2020
by   Pan Xu, et al.
3

We study a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the reward generating function is unknown. We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network (deep representation learning), and uses an upper confidence bound (UCB) approach to explore in the last linear layer (shallow exploration). We prove that under standard assumptions, our proposed algorithm achieves Õ(√(T)) finite-time regret, where T is the learning time horizon. Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the deep neural network.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2019

Neural Contextual Bandits with Upper Confidence Bound-Based Exploration

We study the stochastic contextual bandit problem, where the reward is g...
research
10/24/2022

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

We study the problem of representation learning in stochastic contextual...
research
06/29/2021

Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit

Balancing exploration and exploitation (EE) is a fundamental problem in ...
research
12/19/2022

On the Complexity of Representation Learning in Contextual Linear Bandits

In contextual linear bandits, the reward function is assumed to be a lin...
research
01/24/2022

Learning Contextual Bandits Through Perturbed Rewards

Thanks to the power of representation learning, neural contextual bandit...
research
02/07/2021

Online Limited Memory Neural-Linear Bandits with Likelihood Matching

We study neural-linear bandits for solving problems where both explorati...
research
04/08/2021

Leveraging Good Representations in Linear Contextual Bandits

The linear contextual bandit literature is mostly focused on the design ...

Please sign up or login with your details

Forgot password? Click here to reset