DistilE: Distiling Knowledge Graph Embeddings for Faster and Cheaper Reasoning

09/13/2020
by   Yushan Zhu, et al.
14

Knowledge Graph Embedding (KGE) is a popular method for KG reasoning and usually a higher dimensional one ensures better reasoning capability. However, high-dimensional KGEs pose huge challenges to storage and computing resources and are not suitable for resource-limited or time-constrained applications, for which faster and cheaper reasoning is necessary. To address this problem, we propose DistilE, a knowledge distillation method to build low-dimensional student KGE from pre-trained high-dimensional teacher KGE. We take the original KGE loss as hard label loss and design specific soft label loss for different KGEs in DistilE. We also propose a two-stage distillation approach to make the student and teacher adapt to each other and further improve the reasoning capability of the student. Our DistilE is general enough to be applied to various KGEs. Experimental results of link prediction show that our method successfully distills a good student which performs better than a same dimensional one directly trained, and sometimes even better than the teacher, and it can achieve 2 times - 8 times embedding compression rate and more than 10 times faster inference speed than the teacher with a small performance loss. We also experimentally prove the effectiveness of our two-stage training proposal via ablation study.

READ FULL TEXT
research
10/14/2020

Multi-teacher Knowledge Distillation for Knowledge Graph Completion

Link prediction based on knowledge graph embedding (KGE) aims to predict...
research
06/07/2022

Improving Knowledge Graph Embedding via Iterative Self-Semantic Knowledge Distillation

Knowledge graph embedding (KGE) has been intensively investigated for li...
research
06/05/2021

Bidirectional Distillation for Top-K Recommender System

Recommender systems (RS) have started to employ knowledge distillation, ...
research
06/04/2021

ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression

Pretrained language models (PLMs) such as BERT adopt a training paradigm...
research
10/09/2021

Visualizing the embedding space to explain the effect of knowledge distillation

Recent research has found that knowledge distillation can be effective i...
research
09/23/2022

Descriptor Distillation: a Teacher-Student-Regularized Framework for Learning Local Descriptors

Learning a fast and discriminative patch descriptor is a challenging top...
research
08/02/2019

Distilling Knowledge From a Deep Pose Regressor Network

This paper presents a novel method to distill knowledge from a deep pose...

Please sign up or login with your details

Forgot password? Click here to reset