Learning What You Should Learn

12/11/2022
by   Shitong Shao, et al.
0

In real teaching scenarios, an excellent teacher always teaches what he (or she) is good at but the student is not. This method gives the student the best assistance in making up for his (or her) weaknesses and becoming a good one overall. Enlightened by this, we introduce the approach to the knowledge distillation framework and propose a data-based distillation method named “Teaching what you Should Teach (TST)”. To be specific, TST contains a neural network-based data augmentation module with the priori bias, which can assist in finding what the teacher is good at while the student are not by learning magnitudes and probabilities to generate suitable samples. By training the data augmentation module and the generalized distillation paradigm in turn, a student model that has excellent generalization ability can be created. To verify the effectiveness of TST, we conducted extensive comparative experiments on object recognition (CIFAR-100 and ImageNet-1k), detection (MS-COCO), and segmentation (Cityscapes) tasks. As experimentally demonstrated, TST achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct intriguing studies of TST, including how to solve the performance degradation caused by the stronger teacher and what magnitudes and probabilities are needed for the distillation framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2021

Semi-Online Knowledge Distillation

Knowledge distillation is an effective and stable method for model compr...
research
05/18/2023

Student-friendly Knowledge Distillation

In knowledge distillation, the knowledge from the teacher model is often...
research
11/01/2020

MixKD: Towards Efficient Distillation of Large-scale Language Models

Large-scale language models have recently demonstrated impressive empiri...
research
04/19/2020

Role-Wise Data Augmentation for Knowledge Distillation

Knowledge Distillation (KD) is a common method for transferring the “kno...
research
03/09/2023

Learn More for Food Recognition via Progressive Self-Distillation

Food recognition has a wide range of applications, such as health-aware ...
research
09/23/2021

Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing

We aim to identify how different components in the KD pipeline affect th...
research
05/24/2023

HARD: Hard Augmentations for Robust Distillation

Knowledge distillation (KD) is a simple and successful method to transfe...

Please sign up or login with your details

Forgot password? Click here to reset