Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation

05/16/2023
by   Yuxin Ren, et al.
3

It has been commonly observed that a teacher model with superior performance does not necessarily result in a stronger student, highlighting a discrepancy between current teacher training practices and effective knowledge transfer. In order to enhance the guidance of the teacher training process, we introduce the concept of distillation influence to determine the impact of distillation from each training sample on the student's generalization ability. In this paper, we propose Learning Good Teacher Matters (LGTM), an efficient training technique for incorporating distillation influence into the teacher's learning process. By prioritizing samples that are likely to enhance the student's generalization ability, our LGTM outperforms 10 common knowledge distillation baselines on 6 text classification tasks in the GLUE benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2021

Does Knowledge Distillation Really Work?

Knowledge distillation is a popular technique for training a small stude...
research
09/15/2023

One-Class Knowledge Distillation for Spoofing Speech Detection

The detection of spoofing speech generated by unseen algorithms remains ...
research
06/14/2022

SoTeacher: A Student-oriented Teacher Network Training Framework for Knowledge Distillation

How to train an ideal teacher for knowledge distillation is still an ope...
research
06/23/2023

GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models

Knowledge distillation is commonly used for compressing neural networks ...
research
10/16/2021

Sparse Distillation: Speeding Up Text Classification by Using Bigger Models

Distilling state-of-the-art transformer models into lightweight student ...
research
09/09/2017

Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification

Knowledge distillation is a potential solution for model compression. Th...
research
07/11/2023

The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework

In the context of label-efficient learning on video data, the distillati...

Please sign up or login with your details

Forgot password? Click here to reset