Distillation from Heterogeneous Models for Top-K Recommendation

03/02/2023
by   SeongKu Kang, et al.
0

Recent recommender systems have shown remarkable performance by using an ensemble of heterogeneous models. However, it is exceedingly costly because it requires resources and inference latency proportional to the number of models, which remains the bottleneck for production. Our work aims to transfer the ensemble knowledge of heterogeneous teachers to a lightweight student model using knowledge distillation (KD), to reduce the huge inference costs while retaining high accuracy. Through an empirical study, we find that the efficacy of distillation severely drops when transferring knowledge from heterogeneous teachers. Nevertheless, we show that an important signal to ease the difficulty can be obtained from the teacher's training trajectory. This paper proposes a new KD framework, named HetComp, that guides the student model by transferring easy-to-hard sequences of knowledge generated from the teachers' trajectories. To provide guidance according to the student's learning state, HetComp uses dynamic knowledge construction to provide progressively difficult ranking knowledge and adaptive knowledge transfer to gradually transfer finer-grained ranking information. Our comprehensive experiments show that HetComp significantly improves the distillation quality and the generalization of the student model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2019

FEED: Feature-level Ensemble for Knowledge Distillation

Knowledge Distillation (KD) aims to transfer knowledge in a teacher-stud...
research
09/08/2021

Dual Correction Strategy for Ranking Distillation in Top-N Recommender System

Knowledge Distillation (KD), which transfers the knowledge of a well-tra...
research
06/16/2021

Topology Distillation for Recommender System

Recommender Systems (RS) have employed knowledge distillation which is a...
research
06/23/2020

Distilling Object Detectors with Task Adaptive Regularization

Current state-of-the-art object detectors are at the expense of high com...
research
04/28/2022

Curriculum Learning for Dense Retrieval Distillation

Recent work has shown that more effective dense retrieval models can be ...
research
09/30/2021

Born Again Neural Rankers

We introduce Born Again neural Rankers (BAR) in the Learning to Rank (LT...
research
05/31/2019

The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding Distillation with Ensemble Learning

Recent advances in deep learning have facilitated the demand of neural m...

Please sign up or login with your details

Forgot password? Click here to reset