Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems

08/09/2022
by   Qihua Zhang, et al.
1

Recommender System (RS) is an important online application that affects billions of users every day. The mainstream RS ranking framework is composed of two parts: a Multi-Task Learning model (MTL) that predicts various user feedback, i.e., clicks, likes, sharings, and a Multi-Task Fusion model (MTF) that combines the multi-task outputs into one final ranking score with respect to user satisfaction. There has not been much research on the fusion model while it has great impact on the final recommendation as the last crucial process of the ranking. To optimize long-term user satisfaction rather than obtain instant returns greedily, we formulate MTF task as Markov Decision Process (MDP) within a recommendation session and propose a Batch Reinforcement Learning (RL) based Multi-Task Fusion framework (BatchRL-MTF) that includes a Batch RL framework and an online exploration. The former exploits Batch RL to learn an optimal recommendation policy from the fixed batch data offline for long-term user satisfaction, while the latter explores potential high-value actions online to break through the local optimal dilemma. With a comprehensive investigation on user behaviors, we model the user satisfaction reward with subtle heuristics from two aspects of user stickiness and user activeness. Finally, we conduct extensive experiments on a billion-sample level real-world dataset to show the effectiveness of our model. We propose a conservative offline policy estimator (Conservative-OPEstimator) to test our model offline. Furthermore, we take online experiments in a real recommendation environment to compare performance of different models. As one of few Batch RL researches applied in MTF task successfully, our model has also been deployed on a large-scale industrial short video platform, serving hundreds of millions of users.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2023

CTR is not Enough: a Novel Reinforcement Learning based Ranking Approach for Optimizing Session Clicks

Ranking is a crucial module using in the recommender system. In particul...
research
10/26/2021

Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of Tasks with Multi-dimensional Relations

There has been many studies on improving the efficiency of shared learni...
research
02/13/2021

Sequential Recommendation in Online Games with Multiple Sequences, Tasks and User Levels

Online gaming is a multi-billion-dollar industry, which is growing faste...
research
01/27/2020

Developing Multi-Task Recommendations with Long-Term Rewards via Policy Distilled Reinforcement Learning

With the explosive growth of online products and content, recommendation...
research
04/18/2021

Deep Latent Emotion Network for Multi-Task Learning

Feed recommendation models are widely adopted by numerous feed platforms...
research
06/01/2021

Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL

We study session-based recommendation scenarios where we want to recomme...
research
10/15/2021

Value Penalized Q-Learning for Recommender Systems

Scaling reinforcement learning (RL) to recommender systems (RS) is promi...

Please sign up or login with your details

Forgot password? Click here to reset