Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of Information Selection

10/08/2021
by   Michael Rawson, et al.
0

Contextual multi-armed bandits (CMAB) have been widely used for learning to filter and prioritize information according to a user's interest. In this work, we analyze top-K ranking under the CMAB framework where the top-K arms are chosen iteratively to maximize a reward. The context, which represents a set of observable factors related to the user, is used to increase prediction accuracy compared to a standard multi-armed bandit. Contextual bandit methods have mostly been studied under strict linearity assumptions, but we drop that assumption and learn non-linear stochastic reward functions with deep neural networks. We introduce a novel algorithm called the Deep Upper Confidence Bound (UCB) algorithm. Deep UCB balances exploration and exploitation with a separate neural network to model the learning convergence. We compare the performance of many bandit algorithms varying K over real-world data sets with high-dimensional data and non-linear reward functions. Empirical results show that the performance of Deep UCB often outperforms though it is sensitive to the problem and reward setup. Additionally, we prove theoretical regret bounds on Deep UCB giving convergence to optimality for the weak class of CMAB problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2022

Top-K Ranking Deep Contextual Bandits for Information Selection Systems

In today's technology environment, information is abundant, dynamic, and...
research
04/10/2017

Automated Curriculum Learning for Neural Networks

We introduce a method for automatically selecting the path, or syllabus,...
research
06/06/2021

Multi-facet Contextual Bandits: A Neural Network Perspective

Contextual multi-armed bandit has shown to be an effective tool in recom...
research
05/28/2022

Federated Neural Bandit

Recent works on neural contextual bandit have achieved compelling perfor...
research
12/30/2021

Reversible Upper Confidence Bound Algorithm to Generate Diverse Optimized Candidates

Most algorithms for the multi-armed bandit problem in reinforcement lear...
research
04/03/2020

Hawkes Process Multi-armed Bandits for Disaster Search and Rescue

We propose a novel framework for integrating Hawkes processes with multi...
research
12/02/2021

Recommending with Recommendations

Recommendation systems are a key modern application of machine learning,...

Please sign up or login with your details

Forgot password? Click here to reset