Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

08/03/2020
by   Dalin Guo, et al.
5

Recommender systems trained in a continuous learning fashion are plagued by the feedback loop problem, also known as algorithmic bias. This causes a newly trained model to act greedily and favor items that have already been engaged by users. This behavior is particularly harmful in personalised ads recommendations, as it can also cause new campaigns to remain unexplored. Exploration aims to address this limitation by providing new information about the environment, which encompasses user preference, and can lead to higher long-term reward. In this work, we formulate a display advertising recommender as a contextual bandit and implement exploration techniques that require sampling from the posterior distribution of click-through-rates in a computationally tractable manner. Traditional large-scale deep learning models do not provide uncertainty estimates by default. We approximate these uncertainty measurements of the predictions by employing a bootstrapped model with multiple heads and dropout units. We benchmark a number of different models in an offline simulation environment using a publicly available dataset of user-ads engagements. We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting with large-scale production traffic, where we demonstrate a positive gain of our exploration model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2021

Online certification of preference-based fairness for personalized recommender systems

We propose to assess the fairness of personalized recommender systems in...
research
04/05/2023

Optimism Based Exploration in Large-Scale Recommender Systems

Bandit learning algorithms have been an increasingly popular design choi...
research
06/26/2022

Two-Stage Neural Contextual Bandits for Personalised News Recommendation

We consider the problem of personalised news recommendation where each u...
research
09/28/2020

Position-Based Multiple-Play Bandits with Thompson Sampling

Multiple-play bandits aim at displaying relevant items at relevant posit...
research
10/23/2021

Towards the D-Optimal Online Experiment Design for Recommender Selection

Selecting the optimal recommender via online exploration-exploitation is...
research
07/18/2021

GuideBoot: Guided Bootstrap for Deep Contextual Bandits

The exploration/exploitation (E E) dilemma lies at the core of interac...
research
11/25/2020

Exploration in Online Advertising Systems with Deep Uncertainty-Aware Learning

Modern online advertising systems inevitably rely on personalization met...

Please sign up or login with your details

Forgot password? Click here to reset