Multi-facet Contextual Bandits: A Neural Network Perspective

06/06/2021
by   Yikun Ban, et al.
0

Contextual multi-armed bandit has shown to be an effective tool in recommender systems. In this paper, we study a novel problem of multi-facet bandits involving a group of bandits, each characterizing the users' needs from one unique aspect. In each round, for the given user, we need to select one arm from each bandit, such that the combination of all arms maximizes the final reward. This problem can find immediate applications in E-commerce, healthcare, etc. To address this problem, we propose a novel algorithm, named MuFasa, which utilizes an assembled neural network to jointly learn the underlying reward functions of multiple bandits. It estimates an Upper Confidence Bound (UCB) linked with the expected reward to balance between exploitation and exploration. Under mild assumptions, we provide the regret analysis of MuFasa. It can achieve the near-optimal 𝒪((K+1)√(T)) regret bound where K is the number of bandits and T is the number of played rounds. Furthermore, we conduct extensive experiments to show that MuFasa outperforms strong baselines on real-world data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2023

Neural Exploitation and Exploration of Contextual Bandits

In this paper, we study utilizing neural networks for the exploitation a...
research
10/08/2021

Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of Information Selection

Contextual multi-armed bandits (CMAB) have been widely used for learning...
research
06/08/2022

Neural Bandit with Arm Group Graph

Contextual bandits aim to identify among a set of arms the optimal one w...
research
01/17/2023

Optimal Algorithms for Latent Bandits with Cluster Structure

We consider the problem of latent bandits with cluster structure where t...
research
04/19/2018

Exploring Partially Observed Networks with Nonparametric Bandits

Real-world networks such as social and communication networks are too la...
research
10/06/2021

Tuning Confidence Bound for Stochastic Bandits with Bandit Distance

We propose a novel modification of the standard upper confidence bound (...
research
10/11/2019

Privacy-Preserving Contextual Bandits

Contextual bandits are online learners that, given an input, select an a...

Please sign up or login with your details

Forgot password? Click here to reset