Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

02/28/2023
by   Gaurav patel, et al.
0

Data-free Knowledge Distillation (DFKD) has gained popularity recently, with the fundamental idea of carrying out knowledge transfer from a Teacher neural network to a Student neural network in the absence of training data. However, in the Adversarial DFKD framework, the student network's accuracy, suffers due to the non-stationary distribution of the pseudo-samples under multiple generator updates. To this end, at every generator update, we aim to maintain the student's performance on previously encountered examples while acquiring knowledge from samples of the current distribution. Thus, we propose a meta-learning inspired framework by treating the task of Knowledge-Acquisition (learning from newly generated samples) and Knowledge-Retention (retaining knowledge on previously met samples) as meta-train and meta-test, respectively. Hence, we dub our method as Learning to Retain while Acquiring. Moreover, we identify an implicit aligning factor between the Knowledge-Retention and Knowledge-Acquisition tasks indicating that the proposed student update strategy enforces a common gradient direction for both tasks, alleviating interference between the two objectives. Finally, we support our hypothesis by exhibiting extensive evaluation and comparison of our method with prior arts on multiple datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2022

Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation

Data-free Knowledge Distillation (DFKD) has attracted attention recently...
research
04/12/2021

Dual Discriminator Adversarial Distillation for Data-free Model Compression

Knowledge distillation has been widely used to produce portable and effi...
research
06/08/2021

Meta Learning for Knowledge Distillation

We present Meta Learning for Knowledge Distillation (MetaDistil), a simp...
research
01/09/2022

Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay

Data-Free Knowledge Distillation (KD) allows knowledge transfer from a t...
research
06/14/2022

SoTeacher: A Student-oriented Teacher Network Training Framework for Knowledge Distillation

How to train an ideal teacher for knowledge distillation is still an ope...
research
09/18/2023

Dual Student Networks for Data-Free Model Stealing

Existing data-free model stealing methods use a generator to produce sam...
research
08/27/2020

MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation

Knowledge Distillation (KD) has been one of the most popu-lar methods to...

Please sign up or login with your details

Forgot password? Click here to reset