Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

06/30/2020
by   Hadi Pouransari, et al.
7

Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed student model suffers from an accuracy gap with its teacher. We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (1) modeling student and teacher uncertainties; (2) sampling training examples from underlying data distribution; and (3) matching student and teacher output distributions. We conduct extensive evaluations on regression and classification tasks and show that compared to the original knowledge distillation, extracurricular learning reduces the gap by 46 This leads to major accuracy improvements compared to the empirical risk minimization-based training for various recent neural network architectures: 7.9 top-1 image classification accuracy on the CIFAR100 dataset, and +2.9 top-1 image classification accuracy on the ImageNet dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

ResKD: Residual-Guided Knowledge Distillation

Knowledge distillation has emerge as a promising technique for compressi...
research
04/15/2023

Teacher Network Calibration Improves Cross-Quality Knowledge Distillation

We investigate cross-quality knowledge distillation (CQKD), a knowledge ...
research
07/11/2023

The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework

In the context of label-efficient learning on video data, the distillati...
research
11/08/2020

Ensembled CTR Prediction via Knowledge Distillation

Recently, deep learning-based models have been widely studied for click-...
research
04/07/2021

Distilling and Transferring Knowledge via cGAN-generated Samples for Image Classification and Regression

Knowledge distillation (KD) has been actively studied for image classifi...
research
05/09/2023

DynamicKD: An Effective Knowledge Distillation via Dynamic Entropy Correction-Based Distillation for Gap Optimizing

The knowledge distillation uses a high-performance teacher network to gu...
research
06/19/2021

Teacher's pet: understanding and mitigating biases in distillation

Knowledge distillation is widely used as a means of improving the perfor...

Please sign up or login with your details

Forgot password? Click here to reset