DeepAI AI Chat
Log In Sign Up

Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher

02/09/2019
by   Seyed-Iman Mirzadeh, et al.
22

Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too gigantic to be deployed on edge devices like smart-phones or embedded sensor nodes. There has been efforts to compress these networks, and a popular method is knowledge distillation, where a large (a.k.a. teacher) pre-trained network is used to train a smaller (a.k.a. student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation which employs an intermediate-sized network (a.k.a. teacher assistant) to bridge the gap between the student and the teacher. We study the effect of teacher assistant size and extend the framework to multi-step distillation. Moreover, empirical and theoretical analysis are conducted to analyze the teacher assistant knowledge distillation framework. Extensive experiments on CIFAR-10 and CIFAR-100 datasets and plain CNN and ResNet architectures substantiate the effectiveness of our proposed approach.

READ FULL TEXT

page 6

page 8

02/28/2021

Distilling Knowledge via Intermediate Classifier Heads

The crux of knowledge distillation – as a transfer-learning approach – i...
09/18/2020

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

With the success of deep neural networks, knowledge distillation which g...
06/23/2022

Knowledge Distillation via Weighted Ensemble of Teaching Assistants

Knowledge distillation in machine learning is the process of transferrin...
07/23/2019

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

High storage and computational costs obstruct deep neural networks to be...
02/22/2023

Distilling Calibrated Student from an Uncalibrated Teacher

Knowledge distillation is a common technique for improving the performan...
06/21/2021

Knowledge Distillation via Instance-level Sequence Learning

Recently, distillation approaches are suggested to extract general knowl...
04/14/2021

Annealing Knowledge Distillation

Significant memory and computational requirements of large deep neural n...

Code Repositories

Teacher-Assistant-Knowledge-Distillation

Using Teacher Assistants to Improve Knowledge Distillation: https://arxiv.org/pdf/1902.03393.pdf


view repo

MACH

Biological-Scale Neural Networks


view repo

Knowledge-Distillation-Pipeline

PyTorch Knowledge Distillation Framework


view repo

machine_learning

I use this depository to collect all of my crazy ideas for machine learning (neural networks).


view repo

mutual-knowledge-distillation

My thesis project about combining multiple learning signals


view repo