Log In Sign Up

Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL

by   Bogdan Mazoure, et al.

We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility. Optimizing a long-term metric is challenging because the learning signal (whether the recommendations achieved their desired goals) is delayed and confounded by other user interactions with the system. Immediately measurable proxies such as clicks can lead to suboptimal recommendations due to misalignment with the long-term metric. Many works have applied episodic reinforcement learning (RL) techniques for session-based recommendation but these methods do not account for policy-induced drift in user intent across sessions. We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions. By varying the horizon hyper-parameter in SHPI, we recover well-known policy improvement schemes in the RL literature. Empirical results on four recommendation tasks show that SHPI can outperform matrix factorization, offline bandits, and offline RL baselines. We also provide a stable and computationally efficient implementation using weighted regression oracles.


page 1

page 2

page 3

page 4


Offline Meta-level Model-based Reinforcement Learning Approach for Cold-Start Recommendation

Reinforcement learning (RL) has shown great promise in optimizing long-t...

Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation

Most of the existing deep reinforcement learning (RL) approaches for ses...

Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems

Recommender System (RS) is an important online application that affects ...

Sequential Search with Off-Policy Reinforcement Learning

Recent years have seen a significant amount of interests in Sequential R...

Personalization for Web-based Services using Offline Reinforcement Learning

Large-scale Web-based services present opportunities for improving UI po...

Generative Slate Recommendation with Reinforcement Learning

Recent research has employed reinforcement learning (RL) algorithms to o...

Constrained Reinforcement Learning for Short Video Recommendation

The wide popularity of short videos on social media poses new opportunit...