ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor

06/01/2022
by   Wanqi Xue, et al.
1

Long-term engagement is preferred over immediate engagement in sequential recommendation as it directly affects product operational metrics such as daily active users (DAUs) and dwell time. Meanwhile, reinforcement learning (RL) is widely regarded as a promising framework for optimizing long-term engagement in sequential recommendation. However, due to expensive online interactions, it is very difficult for RL algorithms to perform state-action value estimation, exploration and feature extraction when optimizing long-term engagement. In this paper, we propose ResAct which seeks a policy that is close to, but better than, the online-serving policy. In this way, we can collect sufficient data near the learned policy so that state-action values can be properly estimated, and there is no need to perform online exploration. Directly optimizing this policy is difficult due to the huge policy space. ResAct instead solves it by first reconstructing the online behaviors and then improving it. Our main contributions are fourfold. First, we design a generative model which reconstructs behaviors of the online-serving policy by sampling multiple action estimators. Second, we design an effective learning paradigm to train the residual actor which can output the residual for action improvement. Third, we facilitate the extraction of features with two information theoretical regularizers to confirm the expressiveness and conciseness of features. Fourth, we conduct extensive experiments on a real world dataset consisting of millions of sessions, and our method significantly outperforms the state-of-the-art baselines in various of long term engagement optimization tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2022

PrefRec: Preference-based Recommender Systems for Reinforcing Long-term User Engagement

Current advances in recommender systems have been remarkably successful ...
research
02/13/2019

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

Recommender systems play a crucial role in our daily lives. Feed streami...
research
05/23/2023

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

Auction-based recommender systems are prevalent in online advertising pl...
research
03/20/2022

Learning on the Job: Long-Term Behavioural Adaptation in Human-Robot Interactions

In this work, we propose a framework for allowing autonomous robots depl...
research
06/01/2021

Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL

We study session-based recommendation scenarios where we want to recomme...
research
11/24/2022

Learning to Take a Break: Sustainable Optimization of Long-Term User Engagement

Optimizing user engagement is a key goal for modern recommendation syste...
research
02/01/2022

Sequential Search with Off-Policy Reinforcement Learning

Recent years have seen a significant amount of interests in Sequential R...

Please sign up or login with your details

Forgot password? Click here to reset