Oracle Teacher: Towards Better Knowledge Distillation

11/05/2021
by   Ji Won Yoon, et al.
0

Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teacher model for KD, namely Oracle Teacher, that utilizes the embeddings of both the source inputs and the output labels to extract a more accurate knowledge to be transferred to the student. The proposed model follows the encoder-decoder attention structure of the Transformer network, which allows the model to attend to related information from the output labels. Extensive experiments are conducted on three different sequence learning tasks: speech recognition, scene text recognition, and machine translation. From the experimental results, we empirically show that the proposed model improves the students across these tasks while achieving a considerable speed-up in the teacher model's training time.

READ FULL TEXT

page 1

page 9

research
04/10/2019

Relational Knowledge Distillation

Knowledge distillation aims at transferring knowledge acquired in one mo...
research
06/15/2023

Self-Knowledge Distillation for Surgical Phase Recognition

Purpose: Advances in surgical phase recognition are generally led by tra...
research
06/14/2023

EM-Network: Oracle Guided Self-distillation for Sequence Learning

We introduce EM-Network, a novel self-distillation approach that effecti...
research
05/23/2022

LILA-BOTI : Leveraging Isolated Letter Accumulations By Ordering Teacher Insights for Bangla Handwriting Recognition

Word-level handwritten optical character recognition (OCR) remains a cha...
research
12/31/2019

Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation

Knowledge distillation (KD) is a new method for transferring knowledge o...
research
05/16/2023

Low-complexity deep learning frameworks for acoustic scene classification using teacher-student scheme and multiple spectrograms

In this technical report, a low-complexity deep learning system for acou...
research
11/07/2019

Teacher-Student Training for Robust Tacotron-based TTS

While neural end-to-end text-to-speech (TTS) is superior to conventional...

Please sign up or login with your details

Forgot password? Click here to reset