-
Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution
Knowledge distillation has been used to transfer knowledge learned by a ...
read it
-
Privileged Knowledge Distillation for Online Action Detection
Online Action Detection (OAD) in videos is proposed as a per-frame label...
read it
-
Privileged Features Distillation for E-Commerce Recommendations
Features play an important role in most prediction tasks of e-commerce r...
read it
-
Distill2Vec: Dynamic Graph Representation Learning with Knowledge Distillation
Dynamic graph representation learning strategies are based on different ...
read it
-
Distillation ≈ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network
Distillation is a method to transfer knowledge from one model to another...
read it
-
EGAD: Evolving Graph Representation Learning with Self-Attention and Knowledge Distillation for Live Video Streaming Events
In this study, we present a dynamic graph representation learning model ...
read it
-
Real-Time Correlation Tracking via Joint Model Compression and Transfer
Correlation filters (CF) have received considerable attention in visual ...
read it
Ensembled CTR Prediction via Knowledge Distillation
Recently, deep learning-based models have been widely studied for click-through rate (CTR) prediction and lead to improved prediction accuracy in many industrial applications. However, current research focuses primarily on building complex network architectures to better capture sophisticated feature interactions and dynamic user behaviors. The increased model complexity may slow down online inference and hinder its adoption in real-time applications. Instead, our work targets at a new model training strategy based on knowledge distillation (KD). KD is a teacher-student learning framework to transfer knowledge learned from a teacher model to a student model. The KD strategy not only allows us to simplify the student model as a vanilla DNN model but also achieves significant accuracy improvements over the state-of-the-art teacher models. The benefits thus motivate us to further explore the use of a powerful ensemble of teachers for more accurate student model training. We also propose some novel techniques to facilitate ensembled CTR prediction, including teacher gating and early stopping by distillation loss. We conduct comprehensive experiments against 12 existing models and across three industrial datasets. Both offline and online A/B testing results show the effectiveness of our KD-based training strategy.
READ FULL TEXT
Comments
There are no comments yet.