Optimism Based Exploration in Large-Scale Recommender Systems

04/05/2023
by   Hongbo Guo, et al.
0

Bandit learning algorithms have been an increasingly popular design choice for recommender systems. Despite the strong interest in bandit learning from the community, there remains multiple bottlenecks that prevent many bandit learning approaches from productionalization. Two of the most important bottlenecks are scaling to multi-task and A/B testing. Classic bandit algorithms, especially those leveraging contextual information, often requires reward for uncertainty estimation, which hinders their adoptions in multi-task recommender systems. Moreover, different from supervised learning algorithms, bandit learning algorithms emphasize greatly on the data collection process through their explorative nature. Such explorative behavior induces unfair evaluation for bandit learning agents in a classic A/B test setting. In this work, we present a novel design of production bandit learning life-cycle for recommender systems, along with a novel set of metrics to measure their efficiency in user exploration. We show through large-scale production recommender system experiments and in-depth analysis that our bandit agent design improves personalization for the production recommender system and our experiment design fairly evaluates the performance of bandit learning algorithms.

READ FULL TEXT
research
06/26/2023

Scalable Neural Contextual Bandit for Recommender Systems

High-quality recommender systems ought to deliver both innovative and re...
research
06/04/2019

Toward Building Conversational Recommender Systems: A Contextual Bandit Approach

Contextual bandit algorithms have gained increasing popularity in recomm...
research
08/03/2020

Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

Recommender systems trained in a continuous learning fashion are plagued...
research
09/13/2022

Inclusive Ethical Design for Recommender Systems

Recommender systems are becoming increasingly central as mediators of in...
research
10/06/2021

Optimized Recommender Systems with Deep Reinforcement Learning

Recommender Systems have been the cornerstone of online retailers. Tradi...
research
05/14/2014

Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques

In many recommendation applications such as news recommendation, the ite...
research
06/27/2012

Hierarchical Exploration for Accelerating Contextual Bandits

Contextual bandit learning is an increasingly popular approach to optimi...

Please sign up or login with your details

Forgot password? Click here to reset