Residual Knowledge Distillation

02/21/2020
by   Mengya Gao, et al.
34

Knowledge distillation (KD) is one of the most potent ways for model compression. The key idea is to transfer the knowledge from a deep teacher model (T) to a shallower student (S). However, existing methods suffer from performance degradation due to the substantial gap between the learning capacities of S and T. To remedy this problem, this work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A). Specifically, S is trained to mimic the feature maps of T, and A aids this process by learning the residual error between them. In this way, S and A complement with each other to get better knowledge from T. Furthermore, we devise an effective method to derive S and A from a given model without increasing the total computational cost. Extensive experiments show that our approach achieves appealing results on popular classification datasets, CIFAR-100 and ImageNet, surpassing state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

ResKD: Residual-Guided Knowledge Distillation

Knowledge distillation has emerge as a promising technique for compressi...
research
07/10/2020

Distillation Guided Residual Learning for Binary Convolutional Neural Networks

It is challenging to bridge the performance gap between Binary CNN (BCNN...
research
08/27/2020

MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation

Knowledge Distillation (KD) has been one of the most popu-lar methods to...
research
12/06/2022

Leveraging Different Learning Styles for Improved Knowledge Distillation

Learning style refers to a type of training mechanism adopted by an indi...
research
03/26/2021

Distilling a Powerful Student Model via Online Knowledge Distillation

Existing online knowledge distillation approaches either adopt the stude...
research
09/10/2019

Knowledge Transfer Graph for Deep Collaborative Learning

We propose Deep Collaborative Learning (DCL), which is a method that inc...
research
12/12/2021

Up to 100x Faster Data-free Knowledge Distillation

Data-free knowledge distillation (DFKD) has recently been attracting inc...

Please sign up or login with your details

Forgot password? Click here to reset