Bidirectional Distillation for Top-K Recommender System

06/05/2021
by   Wonbin Kweon, et al.
0

Recommender systems (RS) have started to employ knowledge distillation, which is a model compression technique training a compact model (student) with the knowledge transferred from a cumbersome model (teacher). The state-of-the-art methods rely on unidirectional distillation transferring the knowledge only from the teacher to the student, with an underlying assumption that the teacher is always superior to the student. However, we demonstrate that the student performs better than the teacher on a significant proportion of the test set, especially for RS. Based on this observation, we propose Bidirectional Distillation (BD) framework whereby both the teacher and the student collaboratively improve with each other. Specifically, each model is trained with the distillation loss that makes to follow the other's prediction along with its original loss function. For effective bidirectional distillation, we propose rank discrepancy-aware sampling scheme to distill only the informative knowledge that can fully enhance each other. The proposed scheme is designed to effectively cope with a large performance gap between the teacher and the student. Trained in the bidirectional way, it turns out that both the teacher and the student are significantly improved compared to when being trained separately. Our extensive experiments on real-world datasets show that our proposed framework consistently outperforms the state-of-the-art competitors. We also provide analyses for an in-depth understanding of BD and ablation studies to verify the effectiveness of each proposed component.

READ FULL TEXT
research
06/16/2021

Topology Distillation for Recommender System

Recommender Systems (RS) have employed knowledge distillation which is a...
research
12/08/2020

DE-RRD: A Knowledge Distillation Framework for Recommender System

Recent recommender systems have started to employ knowledge distillation...
research
03/09/2020

Knowledge distillation via adaptive instance normalization

This paper addresses the problem of model compression via knowledge dist...
research
11/07/2019

Teacher-Student Training for Robust Tacotron-based TTS

While neural end-to-end text-to-speech (TTS) is superior to conventional...
research
09/13/2020

DistilE: Distiling Knowledge Graph Embeddings for Faster and Cheaper Reasoning

Knowledge Graph Embedding (KGE) is a popular method for KG reasoning and...
research
11/13/2019

Collaborative Distillation for Top-N Recommendation

Knowledge distillation (KD) is a well-known method to reduce inference l...
research
04/10/2019

Knowledge Squeezed Adversarial Network Compression

Deep network compression has been achieved notable progress via knowledg...

Please sign up or login with your details

Forgot password? Click here to reset