Multi-level Knowledge Distillation

12/01/2020
by   Fei Ding, et al.
17

Knowledge distillation has become an important technique for model compression and acceleration. The conventional knowledge distillation approaches aim to transfer knowledge from teacher to student networks by minimizing the KL-divergence between their probabilistic outputs, which only consider the mutual relationship between individual representations of teacher and student networks. Recently, the contrastive loss-based knowledge distillation is proposed to enable a student to learn the instance discriminative knowledge of a teacher by mapping the same image close and different images far away in the representation space. However, all of these methods ignore that the teacher's knowledge is multi-level, e.g., individual, relational and categorical level. These different levels of knowledge cannot be effectively captured by only one kind of supervisory signal. Here, we introduce Multi-level Knowledge Distillation (MLKD) to transfer richer representational knowledge from teacher to student networks. MLKD employs three novel teacher-student similarities: individual similarity, relational similarity, and categorical similarity, to encourage the student network to learn sample-wise, structure-wise and category-wise knowledge in the teacher network. Experiments demonstrate that MLKD outperforms other state-of-the-art methods on both similar-architecture and cross-architecture tasks. We further show that MLKD can improve the transferability of learned representations in the student network.

READ FULL TEXT
research
10/23/2019

Contrastive Representation Distillation

Often we wish to transfer representational knowledge from one neural net...
research
01/30/2020

Search for Better Students to Learn Distilled Knowledge

Knowledge Distillation, as a model compression technique, has received g...
research
10/20/2022

Similarity of Neural Architectures Based on Input Gradient Transferability

In this paper, we aim to design a quantitative similarity function betwe...
research
01/21/2022

Image-to-Video Re-Identification via Mutual Discriminative Knowledge Transfer

The gap in representations between image and video makes Image-to-Video ...
research
03/18/2021

Similarity Transfer for Knowledge Distillation

Knowledge distillation is a popular paradigm for learning portable neura...
research
06/20/2023

Knowledge Distillation via Token-level Relationship Graph

Knowledge distillation is a powerful technique for transferring knowledg...
research
12/21/2021

Multi-Modality Distillation via Learning the teacher's modality-level Gram Matrix

In the context of multi-modality knowledge distillation research, the ex...

Please sign up or login with your details

Forgot password? Click here to reset