DistPro: Searching A Fast Knowledge Distillation Process via Meta Optimization

04/12/2022
by   Xueqing Deng, et al.
0

Recent Knowledge distillation (KD) studies show that different manually designed schemes impact the learned results significantly. Yet, in KD, automatically searching an optimal distillation scheme has not yet been well explored. In this paper, we propose DistPro, a novel framework which searches for an optimal KD process via differentiable meta-learning. Specifically, given a pair of student and teacher networks, DistPro first sets up a rich set of KD connection from the transmitting layers of the teacher to the receiving layers of the student, and in the meanwhile, various transforms are also proposed for comparing feature maps along its pathway for the distillation. Then, each combination of a connection and a transform choice (pathway) is associated with a stochastic weighting process which indicates its importance at every step during the distillation. In the searching stage, the process can be effectively learned through our proposed bi-level meta-optimization strategy. In the distillation stage, DistPro adopts the learned processes for knowledge distillation, which significantly improves the student accuracy especially when faster training is required. Lastly, we find the learned processes can be generalized between similar tasks and networks. In our experiments, DistPro produces state-of-the-art (SoTA) accuracy under varying number of learning epochs on popular datasets, i.e. CIFAR100 and ImageNet, which demonstrate the effectiveness of our framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2020

Differentiable Feature Aggregation Search for Knowledge Distillation

Knowledge distillation has become increasingly important in model compre...
research
12/03/2018

Knowledge Distillation with Feature Maps for Image Classification

The model reduction problem that eases the computation costs and latency...
research
06/08/2021

Meta Learning for Knowledge Distillation

We present Meta Learning for Knowledge Distillation (MetaDistil), a simp...
research
06/29/2023

Understanding the Overfitting of the Episodic Meta-training

Despite the success of two-stage few-shot classification methods, in the...
research
05/09/2023

DynamicKD: An Effective Knowledge Distillation via Dynamic Entropy Correction-Based Distillation for Gap Optimizing

The knowledge distillation uses a high-performance teacher network to gu...
research
05/18/2022

[Re] Distilling Knowledge via Knowledge Review

This effort aims to reproduce the results of experiments and analyze the...
research
08/27/2020

MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation

Knowledge Distillation (KD) has been one of the most popu-lar methods to...

Please sign up or login with your details

Forgot password? Click here to reset