From Clicks to Conversions: Recommendation for long-term reward

09/01/2020
by   Philomène Chagniot, et al.
0

Recommender systems are often optimised for short-term reward: a recommendation is considered successful if a reward (e.g. a click) can be observed immediately after the recommendation. The advantage of this framework is that with some reasonable (although questionable) assumptions, it allows familiar supervised learning tools to be used for the recommendation task. However, it means that long-term business metrics, e.g. sales or retention are ignored. In this paper we introduce a framework for modeling long-term rewards in the RecoGym simulation environment. We use this newly introduced functionality to showcase problems introduced by the last-click attribution scheme in the case of conversion-optimized recommendations and propose a simple extension that leads to state-of-the-art results.

READ FULL TEXT

page 1

page 2

page 3

research
12/06/2022

PrefRec: Preference-based Recommender Systems for Reinforcing Long-term User Engagement

Current advances in recommender systems have been remarkably successful ...
research
01/27/2020

Developing Multi-Task Recommendations with Long-Term Rewards via Policy Distilled Reinforcement Learning

With the explosive growth of online products and content, recommendation...
research
07/19/2023

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

Recommender systems are a ubiquitous feature of online platforms. Increa...
research
04/01/2019

Enhancing the long-term performance of recommender system

Recommender system is a critically important tool in online commercial s...
research
08/20/2018

Dynamic Intention-Aware Recommendation with Self-Attention

Predicting the missing values given the observed interaction matrix is a...
research
02/04/2014

Short-term plasticity as cause-effect hypothesis testing in distal reward learning

Asynchrony, overlaps and delays in sensory-motor signals introduce ambig...
research
07/24/2018

Learning from Delayed Outcomes with Intermediate Observations

Optimizing for long term value is desirable in many practical applicatio...

Please sign up or login with your details

Forgot password? Click here to reset