Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation

09/21/2022
by   Kien Do, et al.
0

Data-free Knowledge Distillation (DFKD) has attracted attention recently thanks to its appealing capability of transferring knowledge from a teacher network to a student network without using training data. The main idea is to use a generator to synthesize data for training the student. As the generator gets updated, the distribution of synthetic data will change. Such distribution shift could be large if the generator and the student are trained adversarially, causing the student to forget the knowledge it acquired at previous steps. To alleviate this problem, we propose a simple yet effective method called Momentum Adversarial Distillation (MAD) which maintains an exponential moving average (EMA) copy of the generator and uses synthetic samples from both the generator and the EMA generator to train the student. Since the EMA generator can be considered as an ensemble of the generator's old versions and often undergoes a smaller change in updates compared to the generator, training on its synthetic samples can help the student recall the past knowledge and prevent the student from adapting too quickly to new updates of the generator. Our experiments on six benchmark datasets including big datasets like ImageNet and Places365 demonstrate the superior performance of MAD over competing methods for handling the large distribution shift problem. Our method also compares favorably to existing DFKD methods and even achieves state-of-the-art results in some cases.

READ FULL TEXT

page 8

page 18

research
01/11/2023

Synthetic data generation method for data-free knowledge distillation in regression neural networks

Knowledge distillation is the technique of compressing a larger neural n...
research
02/28/2023

Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

Data-free Knowledge Distillation (DFKD) has gained popularity recently, ...
research
09/18/2023

Dual Student Networks for Data-Free Model Stealing

Existing data-free model stealing methods use a generator to produce sam...
research
12/23/2019

Data-Free Adversarial Distillation

Knowledge Distillation (KD) has made remarkable progress in the last few...
research
01/09/2022

Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay

Data-Free Knowledge Distillation (KD) allows knowledge transfer from a t...
research
09/15/2022

On-Device Domain Generalization

We present a systematic study of domain generalization (DG) for tiny neu...
research
12/12/2021

Up to 100x Faster Data-free Knowledge Distillation

Data-free knowledge distillation (DFKD) has recently been attracting inc...

Please sign up or login with your details

Forgot password? Click here to reset