Similarity Transfer for Knowledge Distillation

03/18/2021
by   Haoran Zhao, et al.
18

Knowledge distillation is a popular paradigm for learning portable neural networks by transferring the knowledge from a large model into a smaller one. Most existing approaches enhance the student model by utilizing the similarity information between the categories of instance level provided by the teacher model. However, these works ignore the similarity correlation between different instances that plays an important role in confidence prediction. To tackle this issue, we propose a novel method in this paper, called similarity transfer for knowledge distillation (STKD), which aims to fully utilize the similarities between categories of multiple samples. Furthermore, we propose to better capture the similarity correlation between different instances by the mixup technique, which creates virtual samples by a weighted linear interpolation. Note that, our distillation loss can fully utilize the incorrect classes similarities by the mixed labels. The proposed approach promotes the performance of student model as the virtual sample created by multiple images produces a similar probability distribution in the teacher and student networks. Experiments and ablation studies on several public classification datasets including CIFAR-10,CIFAR-100,CINIC-10 and Tiny-ImageNet verify that this light-weight method can effectively boost the performance of the compact student model. It shows that STKD substantially has outperformed the vanilla knowledge distillation and has achieved superior accuracy over the state-of-the-art knowledge distillation methods.

READ FULL TEXT

page 1

page 3

research
04/03/2019

Correlation Congruence for Knowledge Distillation

Most teacher-student frameworks based on knowledge distillation (KD) dep...
research
12/01/2020

Multi-level Knowledge Distillation

Knowledge distillation has become an important technique for model compr...
research
09/03/2020

Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging

Knowledge Distillation (KD) is a popular area of research for reducing t...
research
11/30/2022

Hint-dynamic Knowledge Distillation

Knowledge Distillation (KD) transfers the knowledge from a high-capacity...
research
06/20/2023

Knowledge Distillation via Token-level Relationship Graph

Knowledge distillation is a powerful technique for transferring knowledg...
research
08/27/2021

CoCo DistillNet: a Cross-layer Correlation Distillation Network for Pathological Gastric Cancer Segmentation

In recent years, deep convolutional neural networks have made significan...
research
02/23/2021

Enhancing Data-Free Adversarial Distillation with Activation Regularization and Virtual Interpolation

Knowledge distillation refers to a technique of transferring the knowled...

Please sign up or login with your details

Forgot password? Click here to reset