Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay

01/09/2022
by   Kuluhan Binici, et al.
15

Data-Free Knowledge Distillation (KD) allows knowledge transfer from a trained neural network (teacher) to a more compact one (student) in the absence of original training data. Existing works use a validation set to monitor the accuracy of the student over real data and report the highest performance throughout the entire process. However, validation data may not be available at distillation time either, making it infeasible to record the student snapshot that achieved the peak accuracy. Therefore, a practical data-free KD method should be robust and ideally provide monotonically increasing student accuracy during distillation. This is challenging because the student experiences knowledge degradation due to the distribution shift of the synthetic data. A straightforward approach to overcome this issue is to store and rehearse the generated samples periodically, which increases the memory footprint and creates privacy concerns. We propose to model the distribution of the previously observed synthetic samples with a generative network. In particular, we design a Variational Autoencoder (VAE) with a training objective that is customized to learn the synthetic data representations optimally. The student is rehearsed by the generative pseudo replay technique, with samples produced by the VAE. Hence knowledge degradation can be prevented without storing any samples. Experiments on image classification benchmarks show that our method optimizes the expected value of the distilled model accuracy while eliminating the large memory overhead incurred by the sample-storing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 6

page 7

page 8

page 9

research
01/11/2023

Synthetic data generation method for data-free knowledge distillation in regression neural networks

Knowledge distillation is the technique of compressing a larger neural n...
research
12/31/2021

Conditional Generative Data-Free Knowledge Distillation based on Attention Transfer

Knowledge distillation has made remarkable achievements in model compres...
research
08/11/2021

Preventing Catastrophic Forgetting and Distribution Mismatch in Knowledge Distillation via Synthetic Data

With the increasing popularity of deep learning on edge devices, compres...
research
02/28/2023

Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

Data-free Knowledge Distillation (DFKD) has gained popularity recently, ...
research
10/27/2021

Beyond Classification: Knowledge Distillation using Multi-Object Impressions

Knowledge Distillation (KD) utilizes training data as a transfer set to ...
research
09/21/2022

Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation

Data-free Knowledge Distillation (DFKD) has attracted attention recently...
research
10/14/2022

Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong Learning in Task-Oriented Dialogue

Lifelong learning (LL) is vital for advanced task-oriented dialogue (ToD...

Please sign up or login with your details

Forgot password? Click here to reset