Dual Discriminator Adversarial Distillation for Data-free Model Compression

04/12/2021
by   Haoran Zhao, et al.
11

Knowledge distillation has been widely used to produce portable and efficient neural networks which can be well applied on edge devices for computer vision tasks. However, almost all top-performing knowledge distillation methods need to access the original training data, which usually has a huge size and is often unavailable. To tackle this problem, we propose a novel data-free approach in this paper, named Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data. To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data. The generator not only uses the pre-trained teacher's intrinsic statistics in existing batch normalization layers but also obtains the maximum discrepancy from the student model. Then the generated samples are used to train the compact student network under the supervision of the teacher. The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data. Extensive experiments are conducted to to demonstrate the effectiveness of the proposed approach on CIFAR-10, CIFAR-100 and Caltech101 datasets for classification tasks. Moreover, we extend our method to semantic segmentation tasks on several public datasets such as CamVid and NYUv2. All experiments show that our method outperforms all baselines for data-free knowledge distillation.

READ FULL TEXT

page 1

page 3

page 7

page 8

research
12/23/2019

Data-Free Adversarial Distillation

Knowledge Distillation (KD) has made remarkable progress in the last few...
research
05/08/2020

Data-Free Network Quantization With Adversarial Knowledge Distillation

Network quantization is an essential procedure in deep learning for deve...
research
04/02/2019

Data-Free Learning of Student Networks

Learning portable neural networks is very essential for computer vision ...
research
12/10/2020

Large-Scale Generative Data-Free Distillation

Knowledge distillation is one of the most popular and effective techniqu...
research
10/27/2021

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Knowledge distillation (KD) aims to craft a compact student model that i...
research
02/28/2023

Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

Data-free Knowledge Distillation (DFKD) has gained popularity recently, ...
research
03/14/2023

Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification

Data-Free Knowledge Distillation (DFKD) has recently attracted growing a...

Please sign up or login with your details

Forgot password? Click here to reset