Learning from Delayed Outcomes with Intermediate Observations

07/24/2018
by   Timothy A. Mann, et al.
4

Optimizing for long term value is desirable in many practical applications, e.g. recommender systems. The most common approach for long term value optimization is supervised learning using long term value as the target. Unfortunately, long term metrics take a long time to measure (e.g., will customers finish reading an ebook?), and vanilla forecasters cannot learn from examples until the outcome is observed. In practical systems where new items arrive frequently, such delay can increase the training-serving skew, thereby negatively affecting the model's predictions for new products. We argue that intermediate observations (e.g., if customers read a third of the book in 24 hours) can improve a model's predictions. We formalize the problem as a semi-stochastic model, where instances are selected by an adversary but, given an instance, the intermediate observation and the outcome are sampled from a factored joint distribution. We propose an algorithm that exploits intermediate observations and theoretically quantify how much it can outperform any prediction method that ignores the intermediate observations. Motivated by the theoretical analysis, we propose two neural network architectures: Factored Forecaster (FF) which is ideal if our assumptions are satisfied, and Residual Factored Forecaster (RFF) that is more robust to model mis-specification. Experiments on two real world datasets, a dataset derived from GitHub repositories and another dataset from a popular marketplace, show that RFF outperforms both FF as well as an algorithm that ignores intermediate observations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2020

Non-Stationary Bandits with Intermediate Observations

Online recommender systems often face long delays in receiving feedback,...
research
10/29/2020

Targeting for long-term outcomes

Decision-makers often want to target interventions (e.g., marketing camp...
research
07/19/2023

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

Recommender systems are a ubiquitous feature of online platforms. Increa...
research
01/17/2022

Exploit Customer Life-time Value with Memoryless Experiments

As a measure of the long-term contribution produced by customers in a se...
research
09/01/2020

From Clicks to Conversions: Recommendation for long-term reward

Recommender systems are often optimised for short-term reward: a recomme...
research
11/08/2020

Skewed Laplace Spectral Mixture kernels for long-term forecasting in Gaussian process

Long-term forecasting involves predicting a horizon that is far ahead of...
research
10/28/2022

Continuous Attribution of Episodical Outcomes for More Efficient and Targeted Online Measurement

Online experimentation platforms collect user feedback at low cost and l...

Please sign up or login with your details

Forgot password? Click here to reset