Learning to Teach with Student Feedback

09/10/2021
by   Yitao Liu, et al.
0

Knowledge distillation (KD) has gained much attention due to its effectiveness in compressing large-scale pre-trained models. In typical KD methods, the small student model is trained to match the soft targets generated by the big teacher model. However, the interaction between student and teacher is one-way. The teacher is usually fixed once trained, resulting in static soft targets to be distilled. This one-way interaction leads to the teacher's inability to perceive the characteristics of the student and its training progress. To address this issue, we propose Interactive Knowledge Distillation (IKD), which also allows the teacher to learn to teach from the feedback of the student. In particular, IKD trains the teacher model to generate specific soft target at each training step for a certain student. Joint optimization for both teacher and student is achieved by two iterative steps: a course step to optimize student with the soft target of teacher, and an exam step to optimize teacher with the feedback of student. IKD is a general framework that is orthogonal to most existing knowledge distillation methods. Experimental results show that IKD outperforms traditional KD methods on various NLP tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2021

Annealing Knowledge Distillation

Significant memory and computational requirements of large deep neural n...
research
02/16/2023

Fuzzy Knowledge Distillation from High-Order TSK to Low-Order TSK

High-order Takagi-Sugeno-Kang (TSK) fuzzy classifiers possess powerful c...
research
12/16/2021

Knowledge Distillation Leveraging Alternative Soft Targets from Non-Parallel Qualified Speech Data

This paper describes a novel knowledge distillation framework that lever...
research
05/23/2022

LILA-BOTI : Leveraging Isolated Letter Accumulations By Ordering Teacher Insights for Bangla Handwriting Recognition

Word-level handwritten optical character recognition (OCR) remains a cha...
research
12/11/2020

Reinforced Multi-Teacher Selection for Knowledge Distillation

In natural language processing (NLP) tasks, slow inference speed and hug...
research
12/26/2022

Prototype-guided Cross-task Knowledge Distillation for Large-scale Models

Recently, large-scale pre-trained models have shown their advantages in ...
research
12/16/2022

Swing Distillation: A Privacy-Preserving Knowledge Distillation Framework

Knowledge distillation (KD) has been widely used for model compression a...

Please sign up or login with your details

Forgot password? Click here to reset