Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation

by   Zhen Tian, et al.

With the growth of high-dimensional sparse data in web-scale recommender systems, the computational cost to learn high-order feature interaction in CTR prediction task largely increases, which limits the use of high-order interaction models in real industrial applications. Some recent knowledge distillation based methods transfer knowledge from complex teacher models to shallow student models for accelerating the online model inference. However, they suffer from the degradation of model accuracy in knowledge distillation process. It is challenging to balance the efficiency and effectiveness of the shallow student models. To address this problem, we propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation. The proposed lightweight student model DAGFM can learn arbitrary explicit feature interactions from teacher networks, which achieves approximately lossless performance and is proved by a dynamic programming algorithm. Besides, an improved general model KD-DAGFM+ is shown to be effective in distilling both explicit and implicit feature interactions from any complex teacher model. Extensive experiments are conducted on four real-world datasets, including a large-scale industrial dataset from WeChat platform with billions of feature dimensions. KD-DAGFM achieves the best performance with less than 21.5 online and offline experiments, showing the superiority of DAGFM to deal with the industrial scale data in CTR prediction task. Our implementation code is available at:


page 1

page 2

page 3

page 4


Ensembled CTR Prediction via Knowledge Distillation

Recently, deep learning-based models have been widely studied for click-...

EulerNet: Adaptive Feature Interaction Learning via Euler's Formula for CTR Prediction

Learning effective high-order feature interactions is very crucial in th...

Knowledge Distillation Meets Open-Set Semi-Supervised Learning

Existing knowledge distillation methods mostly focus on distillation of ...

HIRE: Distilling High-order Relational Knowledge From Heterogeneous Graph Neural Networks

Researchers have recently proposed plenty of heterogeneous graph neural ...

Distill2Vec: Dynamic Graph Representation Learning with Knowledge Distillation

Dynamic graph representation learning strategies are based on different ...

AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction

Learning effective feature interactions is crucial for click-through rat...

Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling

Knowledge distillation is an effective technique that transfers knowledg...

Please sign up or login with your details

Forgot password? Click here to reset