Embedded Knowledge Distillation in Depth-level Dynamic Neural Network

03/01/2021
by   Shuchang Lyu, et al.
5

In real applications, different computation-resource devices need different-depth networks (e.g., ResNet-18/34/50) with high-accuracy. Usually, existing strategies either design multiple networks (nets) and train them independently, or utilize compression techniques (e.g., low-rank decomposition, pruning, and teacher-to-student) to evolve a trained large model into a small net. These methods are subject to the low-accuracy of small nets, or complicated training processes induced by the dependence of accompanying assistive large models. In this article, we propose an elegant Depth-level Dynamic Neural Network (DDNN) integrated different-depth sub-nets of similar architectures. Instead of training individual nets with different-depth configurations, we only train a DDNN to dynamically switch different-depth sub-nets at runtime using one set of shared weight parameters. To improve the generalization of sub-nets, we design the Embedded-Knowledge-Distillation (EKD) training mechanism for the DDNN to implement semantic knowledge transfer from the teacher (full) net to multiple sub-nets. Specifically, the Kullback-Leibler divergence is introduced to constrain the posterior class probability consistency between full-net and sub-net, and self-attention on the same resolution feature of different depth is addressed to drive more abundant feature representations of sub-nets. Thus, we can obtain multiple high accuracy sub-nets simultaneously in a DDNN via the online knowledge distillation in each training iteration without extra computation cost. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate that sub-nets in DDNN with EKD training achieves better performance than the depth-level pruning or individually training while preserving the original performance of full-net.

READ FULL TEXT

page 1

page 4

page 9

page 11

research
09/26/2021

Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better

Knowledge distillation field delicately designs various types of knowled...
research
12/25/2022

BD-KD: Balancing the Divergences for Online Knowledge Distillation

Knowledge distillation (KD) has gained a lot of attention in the field o...
research
01/27/2022

Dynamic Rectification Knowledge Distillation

Knowledge Distillation is a technique which aims to utilize dark knowled...
research
06/16/2023

Squeezing nnU-Nets with Knowledge Distillation for On-Board Cloud Detection

Cloud detection is a pivotal satellite image pre-processing step that ca...
research
09/29/2020

Deep discriminant analysis for task-dependent compact network search

Most of today's popular deep architectures are hand-engineered for gener...
research
03/17/2016

Do Deep Convolutional Nets Really Need to be Deep and Convolutional?

Yes, they do. This paper provides the first empirical demonstration that...
research
11/10/2020

Stage-wise Channel Pruning for Model Compression

Auto-ML pruning methods aim at searching a pruning strategy automaticall...

Please sign up or login with your details

Forgot password? Click here to reset