Preparing Lessons: Improve Knowledge Distillation with Better Supervision

11/18/2019
by   Tiancheng Wen, et al.
0

Knowledge distillation (KD) is widely used for training a compact model with the supervision of another large model, which could effectively improve the performance. Previous methods mainly focus on two aspects: 1) training the student to mimic representation space of the teacher; 2) training the model progressively or adding extra module like discriminator. Knowledge from teacher is useful, but it is still not exactly right compared with ground truth. Besides, overly uncertain supervision also influences the result. We introduce two novel approaches, Knowledge Adjustment (KA) and Dynamic Temperature Distillation (DTD), to penalize bad supervision and improve student model. Experiments on CIFAR-100, CINIC-10 and Tiny ImageNet show that our methods get encouraging performance compared with state-of-the-art methods. When combined with other KD-based methods, the performance will be further improved.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2021

Confidence-Aware Multi-Teacher Knowledge Distillation

Knowledge distillation is initially introduced to utilize additional sup...
research
11/27/2022

Class-aware Information for Logit-based Knowledge Distillation

Knowledge distillation aims to transfer knowledge to the student model b...
research
06/21/2021

Knowledge Distillation via Instance-level Sequence Learning

Recently, distillation approaches are suggested to extract general knowl...
research
01/28/2023

Supervision Complexity and its Role in Knowledge Distillation

Despite the popularity and efficacy of knowledge distillation, there is ...
research
12/10/2022

LEAD: Liberal Feature-based Distillation for Dense Retrieval

Knowledge distillation is often used to transfer knowledge from a strong...
research
03/09/2022

Efficient Sub-structured Knowledge Distillation

Structured prediction models aim at solving a type of problem where the ...
research
03/04/2022

Better Supervisory Signals by Observing Learning Paths

Better-supervised models might have better performance. In this paper, w...

Please sign up or login with your details

Forgot password? Click here to reset