Constrained Reinforcement Learning for Short Video Recommendation

05/26/2022
by   Qingpeng Cai, et al.
1

The wide popularity of short videos on social media poses new opportunities and challenges to optimize recommender systems on the video-sharing platforms. Users provide complex and multi-faceted responses towards recommendations, including watch time and various types of interactions with videos. As a result, established recommendation algorithms that concern a single objective are not adequate to meet this new demand of optimizing comprehensive user experiences. In this paper, we formulate the problem of short video recommendation as a constrained Markov Decision Process (MDP), where platforms want to optimize the main goal of user watch time in long term, with the constraint of accommodating the auxiliary responses of user interactions such as sharing/downloading videos. To solve the constrained MDP, we propose a two-stage reinforcement learning approach based on actor-critic framework. At stage one, we learn individual policies to optimize each auxiliary response. At stage two, we learn a policy to (i) optimize the main response and (ii) stay close to policies learned at the first stage, which effectively guarantees the performance of this main policy on the auxiliaries. Through extensive simulations, we demonstrate effectiveness of our approach over alternatives in both optimizing the main goal as well as balancing the others. We further show the advantage of our approach in live experiments of short video recommendations, where it significantly outperforms other baselines in terms of watch time and interactions from video views. Our approach has been fully launched in the production system to optimize user experiences on the platform.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2023

Two-Stage Constrained Actor-Critic for Short Video Recommendation

The wide popularity of short videos on social media poses new opportunit...
research
02/03/2023

Reinforcing User Retention in a Billion Scale Short Video Recommender System

Recently, short video platforms have achieved rapid user growth by recom...
research
09/09/2021

User Tampering in Reinforcement Learning Recommender Systems

This paper provides the first formalisation and empirical demonstration ...
research
08/20/2021

Reinforcement Learning to Optimize Lifetime Value in Cold-Start Recommendation

Recommender system plays a crucial role in modern E-commerce platform. D...
research
02/07/2023

Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective

We study the problem of optimizing a recommender system for outcomes tha...
research
06/13/2022

Deconfounding Duration Bias in Watch-time Prediction for Video Recommendation

Watch-time prediction remains to be a key factor in reinforcing user eng...
research
03/25/2023

The Challenges of Studying Misinformation on Video-Sharing Platforms During Crises and Mass-Convergence Events

Mis- and disinformation can spread rapidly on video-sharing platforms (V...

Please sign up or login with your details

Forgot password? Click here to reset