Zero-Shot Knowledge Distillation in Deep Networks

05/20/2019
by   Gaurav Kumar Nayak, et al.
0

Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even using any meta-data, we synthesize the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation. We, therefore, dub our method "Zero-Shot Knowledge Distillation" and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets.

READ FULL TEXT

page 4

page 7

research
12/31/2020

Towards Zero-Shot Knowledge Distillation for Natural Language Processing

Knowledge Distillation (KD) is a common knowledge transfer algorithm use...
research
10/27/2021

Beyond Classification: Knowledge Distillation using Multi-Object Impressions

Knowledge Distillation (KD) utilizes training data as a transfer set to ...
research
12/02/2020

Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains

Pre-trained language models have been applied to various NLP tasks with ...
research
07/21/2023

Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Knowledge distillation aims to learn a lightweight student network from ...
research
12/09/2020

Progressive Network Grafting for Few-Shot Knowledge Distillation

Knowledge distillation has demonstrated encouraging performances in deep...
research
12/31/2021

Conditional Generative Data-Free Knowledge Distillation based on Attention Transfer

Knowledge distillation has made remarkable achievements in model compres...
research
11/18/2020

Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge Distillation

Knowledge Distillation is an effective method to transfer the learning a...

Please sign up or login with your details

Forgot password? Click here to reset