Toward Understanding Privileged Features Distillation in Learning-to-Rank

09/19/2022
by   Shuo Yang, et al.
0

In learning-to-rank problems, a privileged feature is one that is available during model training, but not available at test time. Such features naturally arise in merchandised recommendation systems; for instance, "user clicked this item" as a feature is predictive of "user purchased this item" in the offline data, but is clearly not available during online serving. Another source of privileged features is those that are too expensive to compute online but feasible to be added offline. Privileged features distillation (PFD) refers to a natural idea: train a "teacher" model using all features (including privileged ones) and then use it to train a "student" model that does not use the privileged features. In this paper, we first study PFD empirically on three public ranking datasets and an industrial-scale ranking problem derived from Amazon's logs. We show that PFD outperforms several baselines (no-distillation, pretraining-finetuning, self-distillation, and generalized distillation) on all these datasets. Next, we analyze why and when PFD performs well via both empirical ablation studies and theoretical analysis for linear models. Both investigations uncover an interesting non-monotone behavior: as the predictive power of a privileged feature increases, the performance of the resulting student model initially increases but then decreases. We show the reason for the later decreasing performance is that a very predictive privileged teacher produces predictions with high variance, which lead to high variance student estimates and inferior testing performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2018

Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System

We propose a novel way to train ranking models, such as recommender syst...
research
07/11/2019

Privileged Features Distillation for E-Commerce Recommendations

Features play an important role in most prediction tasks of e-commerce r...
research
06/04/2021

Churn Reduction via Distillation

In real-world systems, models are frequently updated as more data become...
research
09/30/2021

Born Again Neural Rankers

We introduce Born Again neural Rankers (BAR) in the Learning to Rank (LT...
research
12/06/2022

Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation

In this work, we present a novel method, named AV2vec, for learning audi...
research
12/06/2018

Online Model Distillation for Efficient Video Inference

High-quality computer vision models typically address the problem of und...
research
06/13/2022

Robust Distillation for Worst-class Performance

Knowledge distillation has proven to be an effective technique in improv...

Please sign up or login with your details

Forgot password? Click here to reset