DeepAI AI Chat
Log In Sign Up

Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher

by   Seyed-Iman Mirzadeh, et al.

Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too gigantic to be deployed on edge devices like smart-phones or embedded sensor nodes. There has been efforts to compress these networks, and a popular method is knowledge distillation, where a large (a.k.a. teacher) pre-trained network is used to train a smaller (a.k.a. student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation which employs an intermediate-sized network (a.k.a. teacher assistant) to bridge the gap between the student and the teacher. We study the effect of teacher assistant size and extend the framework to multi-step distillation. Moreover, empirical and theoretical analysis are conducted to analyze the teacher assistant knowledge distillation framework. Extensive experiments on CIFAR-10 and CIFAR-100 datasets and plain CNN and ResNet architectures substantiate the effectiveness of our proposed approach.


page 6

page 8


Distilling Knowledge via Intermediate Classifier Heads

The crux of knowledge distillation – as a transfer-learning approach – i...

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

With the success of deep neural networks, knowledge distillation which g...

Knowledge Distillation via Weighted Ensemble of Teaching Assistants

Knowledge distillation in machine learning is the process of transferrin...

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

High storage and computational costs obstruct deep neural networks to be...

Distilling Calibrated Student from an Uncalibrated Teacher

Knowledge distillation is a common technique for improving the performan...

Knowledge Distillation via Instance-level Sequence Learning

Recently, distillation approaches are suggested to extract general knowl...

Annealing Knowledge Distillation

Significant memory and computational requirements of large deep neural n...

Code Repositories


Using Teacher Assistants to Improve Knowledge Distillation:

view repo


Biological-Scale Neural Networks

view repo


PyTorch Knowledge Distillation Framework

view repo


I use this depository to collect all of my crazy ideas for machine learning (neural networks).

view repo


My thesis project about combining multiple learning signals

view repo