Conditional Generative Data-Free Knowledge Distillation based on Attention Transfer

12/31/2021
by   Xinyi Yu, et al.
22

Knowledge distillation has made remarkable achievements in model compression. However, most existing methods demand original training data, while real data in practice are often unavailable due to privacy, security and transmission limitation. To address this problem, we propose a conditional generative data-free knowledge distillation (CGDD) framework to train efficient portable network without any real data. In this framework, except using the knowledge extracted from teacher model, we introduce preset labels as additional auxiliary information to train the generator. Then, the trained generator can produce meaningful training samples of specified category as required. In order to promote distillation process, except using conventional distillation loss, we treat preset label as ground truth label so that student network is directly supervised by the category of synthetic training sample. Moreover, we force student network to mimic the attention maps of teacher model and further improve its performance. To verify the superiority of our method, we design a new evaluation metric is called as relative accuracy to directly compare the effectiveness of different distillation methods. Trained portable network learned with proposed data-free distillation method obtains 99.63 99.84 experimental results demonstrate the superiority of proposed method.

READ FULL TEXT

page 9

page 10

research
05/20/2019

Zero-Shot Knowledge Distillation in Deep Networks

Knowledge distillation deals with the problem of training a smaller mode...
research
11/07/2020

Robustness and Diversity Seeking Data-Free Knowledge Distillation

Knowledge distillation (KD) has enabled remarkable progress in model com...
research
12/10/2020

Large-Scale Generative Data-Free Distillation

Knowledge distillation is one of the most popular and effective techniqu...
research
01/09/2022

Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay

Data-Free Knowledge Distillation (KD) allows knowledge transfer from a t...
research
04/04/2023

Label-guided Attention Distillation for Lane Segmentation

Contemporary segmentation methods are usually based on deep fully convol...
research
10/14/2021

ClonalNet: Classifying Better by Focusing on Confusing Categories

Existing neural classification networks predominately adopt one-hot enco...
research
03/20/2020

Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN

Recent advances in deep learning have provided procedures for learning o...

Please sign up or login with your details

Forgot password? Click here to reset