Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher

02/09/2019
by   Seyed-Iman Mirzadeh, et al.
22

Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too gigantic to be deployed on edge devices like smart-phones or embedded sensor nodes. There has been efforts to compress these networks, and a popular method is knowledge distillation, where a large (a.k.a. teacher) pre-trained network is used to train a smaller (a.k.a. student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation which employs an intermediate-sized network (a.k.a. teacher assistant) to bridge the gap between the student and the teacher. We study the effect of teacher assistant size and extend the framework to multi-step distillation. Moreover, empirical and theoretical analysis are conducted to analyze the teacher assistant knowledge distillation framework. Extensive experiments on CIFAR-10 and CIFAR-100 datasets and plain CNN and ResNet architectures substantiate the effectiveness of our proposed approach.

READ FULL TEXT

page 6

page 8

research
02/28/2021

Distilling Knowledge via Intermediate Classifier Heads

The crux of knowledge distillation – as a transfer-learning approach – i...
research
09/18/2020

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

With the success of deep neural networks, knowledge distillation which g...
research
06/23/2022

Knowledge Distillation via Weighted Ensemble of Teaching Assistants

Knowledge distillation in machine learning is the process of transferrin...
research
07/23/2019

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

High storage and computational costs obstruct deep neural networks to be...
research
02/22/2023

Distilling Calibrated Student from an Uncalibrated Teacher

Knowledge distillation is a common technique for improving the performan...
research
06/21/2021

Knowledge Distillation via Instance-level Sequence Learning

Recently, distillation approaches are suggested to extract general knowl...
research
04/14/2021

Annealing Knowledge Distillation

Significant memory and computational requirements of large deep neural n...

Please sign up or login with your details

Forgot password? Click here to reset