Towards Zero-Shot Knowledge Distillation for Natural Language Processing

12/31/2020
by   Ahmad Rashid, et al.
13

Knowledge Distillation (KD) is a common knowledge transfer algorithm used for model compression across a variety of deep learning based natural language processing (NLP) solutions. In its regular manifestations, KD requires access to the teacher's training data for knowledge transfer to the student network. However, privacy concerns, data regulations and proprietary reasons may prevent access to such data. We present, to the best of our knowledge, the first work on Zero-Shot Knowledge Distillation for NLP, where the student learns from the much larger teacher without any task specific data. Our solution combines out of domain data and adversarial training to learn the teacher's output distribution. We investigate six tasks from the GLUE benchmark and demonstrate that we can achieve between 75 (accuracy or F1) while compressing the model 30 times.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2019

Zero-Shot Knowledge Distillation in Deep Networks

Knowledge distillation deals with the problem of training a smaller mode...
research
05/24/2023

Large Language Model Distillation Doesn't Need a Teacher

Knowledge distillation trains a smaller student model to match the outpu...
research
01/27/2023

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

Knowledge distillation (KD) is one of the prominent techniques for model...
research
05/03/2023

A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training

Modern Natural Language Generation (NLG) models come with massive comput...
research
09/13/2021

How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

Knowledge Distillation (KD) is a model compression algorithm that helps ...
research
08/02/2019

Self-Knowledge Distillation in Natural Language Processing

Since deep learning became a key player in natural language processing (...
research
10/27/2021

Beyond Classification: Knowledge Distillation using Multi-Object Impressions

Knowledge Distillation (KD) utilizes training data as a transfer set to ...

Please sign up or login with your details

Forgot password? Click here to reset