Improving Robustness in Knowledge Distillation Using Domain-Targeted Data Augmentation

05/22/2023
by   Joe Stacey, et al.
0

Applying knowledge distillation encourages a student model to behave more like a teacher model, largely retaining the performance of the teacher model, even though the student model may have substantially fewer parameters. However, while distillation helps student models behave more like teacher models in-distribution, this is not necessarily the case out-of-distribution. To address this, we use a language model to create task-specific unlabeled data that mimics the data in targeted out-of-distribution domains. We use this generated data for knowledge distillation on the task of Natural Language Inference (NLI), encouraging the student models to behave more like the teacher models for these examples. Our domain-targeted augmentation is highly effective, and outperforms previous robustness methods when evaluating out-of-distribution performance on MNLI. Surprisingly, this method also improves performance on out-of-distribution domains that the data was not generated for. We additionally introduce Distilled Minority Upsampling (DMU), a method for identifying and upsampling minority examples during the distillation. DMU is complementary to the domain-targeted augmentation, and substantially improves performance on SNLI-hard. Finally, we show out-of-distribution improvements on HANS from both of our methods, despite augmenting the training data with fewer than 5k examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2023

Distilling Calibrated Student from an Uncalibrated Teacher

Knowledge distillation is a common technique for improving the performan...
research
04/19/2020

Role-Wise Data Augmentation for Knowledge Distillation

Knowledge Distillation (KD) is a common method for transferring the “kno...
research
01/20/2021

Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation

Despite pre-trained language models such as BERT have achieved appealing...
research
05/03/2023

A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training

Modern Natural Language Generation (NLG) models come with massive comput...
research
10/25/2021

Distributionally Robust Recurrent Decoders with Random Network Distillation

Neural machine learning models can successfully model language that is s...
research
01/27/2023

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

Knowledge distillation (KD) is one of the prominent techniques for model...
research
05/24/2023

HARD: Hard Augmentations for Robust Distillation

Knowledge distillation (KD) is a simple and successful method to transfe...

Please sign up or login with your details

Forgot password? Click here to reset