Deeply-Supervised Knowledge Distillation

02/16/2022
by   Shiya Luo, et al.
0

Knowledge distillation aims to enhance the performance of a lightweight student model by exploiting the knowledge from a pre-trained cumbersome teacher model. However, in the traditional knowledge distillation, teacher predictions are only used to provide the supervisory signal for the last layer of the student model, which may result in those shallow student layers lacking accurate training guidance in the layer-by-layer back propagation and thus hinders effective knowledge transfer. To address this issue, we propose Deeply-Supervised Knowledge Distillation (DSKD), which fully utilizes class predictions and feature maps of the teacher model to supervise the training of shallow student layers. A loss-based weight allocation strategy is developed in DSKD to adaptively balance the learning process of each shallow layer, so as to further improve the student performance. Extensive experiments show that the performance of DSKD consistently exceeds state-of-the-art methods on various teacher-student models, confirming the effectiveness of our proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2021

RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation

Intermediate layer knowledge distillation (KD) can improve the standard ...
research
12/30/2021

Confidence-Aware Multi-Teacher Knowledge Distillation

Knowledge distillation is initially introduced to utilize additional sup...
research
03/12/2021

Self-Feature Regularization: Self-Feature Distillation Without Teacher Models

Knowledge distillation is the process of transferring the knowledge from...
research
11/03/2020

In Defense of Feature Mimicking for Knowledge Distillation

Knowledge distillation (KD) is a popular method to train efficient netwo...
research
02/22/2023

Debiased Distillation by Transplanting the Last Layer

Deep models are susceptible to learning spurious correlations, even duri...
research
12/10/2021

DisCo: Effective Knowledge Distillation For Contrastive Learning of Sentence Embeddings

Contrastive learning has been proven suitable for learning sentence embe...
research
10/09/2021

Visualizing the embedding space to explain the effect of knowledge distillation

Recent research has found that knowledge distillation can be effective i...

Please sign up or login with your details

Forgot password? Click here to reset