CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation

04/15/2022
by   Md. Akmal Haidar, et al.
0

Knowledge distillation (KD) is an efficient framework for compressing large-scale pre-trained language models. Recent years have seen a surge of research aiming to improve KD by leveraging Contrastive Learning, Intermediate Layer Distillation, Data Augmentation, and Adversarial Training. In this work, we propose a learning based data augmentation technique tailored for knowledge distillation, called CILDA. To the best of our knowledge, this is the first time that intermediate layer representations of the main task are used in improving the quality of augmented samples. More precisely, we introduce an augmentation technique for KD based on intermediate layer matching using contrastive loss to improve masked adversarial data augmentation. CILDA outperforms existing state-of-the-art KD approaches on the GLUE benchmark, as well as in an out-of-domain evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2021

MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation

The advent of large pre-trained language models has given rise to rapid ...
research
12/14/2020

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

The pre-training models such as BERT have achieved great results in vari...
research
12/05/2020

Knowledge Distillation Thrives on Data Augmentation

Knowledge distillation (KD) is a general deep neural network training fr...
research
10/21/2022

Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation

Knowledge distillation is one of the primary methods of transferring kno...
research
05/28/2021

Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax

In Natural Language Processing (NLP), finding data augmentation techniqu...
research
09/10/2023

Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch

Unsupervised contrastive learning methods have recently seen significant...
research
09/01/2020

Automatic Assignment of Radiology Examination Protocols Using Pre-trained Language Models with Knowledge Distillation

Selecting radiology examination protocol is a repetitive, error-prone, a...

Please sign up or login with your details

Forgot password? Click here to reset