Neural Exploitation and Exploration of Contextual Bandits

05/05/2023
by   Yikun Ban, et al.
0

In this paper, we study utilizing neural networks for the exploitation and exploration of contextual multi-armed bandits. Contextual multi-armed bandits have been studied for decades with various applications. To solve the exploitation-exploration trade-off in bandits, there are three main techniques: epsilon-greedy, Thompson Sampling (TS), and Upper Confidence Bound (UCB). In recent literature, a series of neural bandit algorithms have been proposed to adapt to the non-linear reward function, combined with TS or UCB strategies for exploration. In this paper, instead of calculating a large-deviation based statistical bound for exploration like previous methods, we propose, “EE-Net,” a novel neural-based exploitation and exploration strategy. In addition to using a neural network (Exploitation network) to learn the reward function, EE-Net uses another neural network (Exploration network) to adaptively learn the potential gains compared to the currently estimated reward for exploration. We provide an instance-based 𝒪(√(T)) regret upper bound for EE-Net and show that EE-Net outperforms related linear and neural contextual bandit baselines on real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2021

EE-Net: Exploitation-Exploration Neural Networks in Contextual Bandits

Contextual multi-armed bandits have been studied for decades and adapted...
research
05/04/2020

Hyper-parameter Tuning for the Contextual Bandit

We study here the problem of learning the exploration exploitation trade...
research
06/06/2021

Multi-facet Contextual Bandits: A Neural Network Perspective

Contextual multi-armed bandit has shown to be an effective tool in recom...
research
07/25/2018

Deep Contextual Multi-armed Bandits

Contextual multi-armed bandit problems arise frequently in important ind...
research
11/14/2019

Contextual Bandits Evolving Over Finite Time

Contextual bandits have the same exploration-exploitation trade-off as s...
research
10/06/2021

Tuning Confidence Bound for Stochastic Bandits with Bandit Distance

We propose a novel modification of the standard upper confidence bound (...
research
10/12/2022

Maximum entropy exploration in contextual bandits with neural networks and energy based models

Contextual bandits can solve a huge range of real-world problems. Howeve...

Please sign up or login with your details

Forgot password? Click here to reset