Knowledge Condensation Distillation

07/12/2022
by   Chenxin Li, et al.
19

Knowledge Distillation (KD) transfers the knowledge from a high-capacity teacher network to strengthen a smaller student. Existing methods focus on excavating the knowledge hints and transferring the whole knowledge to the student. However, the knowledge redundancy arises since the knowledge shows different values to the student at different learning stages. In this paper, we propose Knowledge Condensation Distillation (KCD). Specifically, the knowledge value on each sample is dynamically estimated, based on which an Expectation-Maximization (EM) framework is forged to iteratively condense a compact knowledge set from the teacher to guide the student learning. Our approach is easy to build on top of the off-the-shelf KD methods, with no extra training parameters and negligible computation overhead. Thus, it presents one new perspective for KD, in which the student that actively identifies teacher's knowledge in line with its aptitude can learn to learn more effectively and efficiently. Experiments on standard benchmarks manifest that the proposed KCD can well boost the performance of student model with even higher distillation efficiency. Code is available at https://github.com/dzy3/KCD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2021

Dynamic Knowledge Distillation for Pre-trained Language Models

Knowledge distillation (KD) has been proved effective for compressing la...
research
04/19/2021

Distilling Knowledge via Knowledge Review

Knowledge distillation transfers knowledge from the teacher network to t...
research
08/15/2021

Multi-granularity for knowledge distillation

Considering the fact that students have different abilities to understan...
research
01/18/2022

It's All in the Head: Representation Knowledge Distillation through Classifier Sharing

Representation knowledge distillation aims at transferring rich informat...
research
04/11/2020

Inter-Region Affinity Distillation for Road Marking Segmentation

We study the problem of distilling knowledge from a large deep teacher n...
research
05/12/2018

Born Again Neural Networks

Knowledge distillation (KD) consists of transferring knowledge from one ...
research
08/18/2020

Knowledge Transfer via Dense Cross-Layer Mutual-Distillation

Knowledge Distillation (KD) based methods adopt the one-way Knowledge Tr...

Please sign up or login with your details

Forgot password? Click here to reset