Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network

03/09/2020
by   Wonchul Son, et al.
0

There is a need for an on-the-fly computational process with very low performance system such as system-on-chip (SoC) and embedded device etc. This paper presents pacemaker knowledge distillation as intermediate ensemble teacher to use convolutional neural network in these systems. For on-the-fly system, we consider student model using 1xN shape on-the-fly filter and teacher model using normal NxN shape filter. We note three points about training student model, caused by applying on-the-fly filter. First, same depth but unavoidable thin model compression. Second, the large capacity gap and parameter size gap due to only the horizontal field must be selected not the vertical receptive. Third, the performance instability and degradation of direct distilling. To solve these problems, we propose intermediate teacher, named pacemaker, for an on-the-fly student. So, student can be trained from pacemaker and original teacher step by step. Experiments prove our proposed method make significant performance (accuracy) improvements: on CIFAR100, 5.39 increased in WRN-40-4 than conventional knowledge distillation which shows even low performance than baseline. And we solve train instability, occurred when conventional knowledge distillation was applied without proposed method, by reducing deviation range by applying proposed method pacemaker knowledge distillation.

READ FULL TEXT
research
02/28/2021

Distilling Knowledge via Intermediate Classifier Heads

The crux of knowledge distillation – as a transfer-learning approach – i...
research
09/18/2020

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

With the success of deep neural networks, knowledge distillation which g...
research
12/05/2020

Multi-head Knowledge Distillation for Model Compression

Several methods of knowledge distillation have been developed for neural...
research
07/25/2023

Fitting Auditory Filterbanks with Multiresolution Neural Networks

Waveform-based deep learning faces a dilemma between nonparametric and p...
research
08/16/2020

Cascaded channel pruning using hierarchical self-distillation

In this paper, we propose an approach for filter-level pruning with hier...
research
06/12/2018

Knowledge Distillation by On-the-Fly Native Ensemble

Knowledge distillation is effective to train small and generalisable net...
research
10/02/2018

LIT: Block-wise Intermediate Representation Training for Model Compression

Knowledge distillation (KD) is a popular method for reducing the computa...

Please sign up or login with your details

Forgot password? Click here to reset