Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation

10/21/2022
by   Ziqi Wang, et al.
0

Knowledge distillation is one of the primary methods of transferring knowledge from large to small models. However, it requires massive task-specific data, which may not be plausible in many real-world applications. Data augmentation methods such as representation interpolation, token replacement, or augmentation with models are applied to tackle this problem. However, these data augmentation methods either potentially cause shifts in decision boundaries (representation interpolation), are not expressive enough (token replacement), or introduce too much computational overhead (augmentation with models). To this end, we propose AugPro (Augmentation with Projection), an effective and efficient data augmentation method for distillation. Our method builds on top of representation interpolation augmentation methods to maintain the diversity of expressions and converts the augmented data to tokens to avoid shifting decision boundaries. It uses simple operations that come with little computational overhead. The results on multiple GLUE tasks show that our methods can improve distillation performance by a large margin at a low time cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2023

Understanding the Effect of Data Augmentation on Knowledge Distillation

Knowledge distillation (KD) requires sufficient data to transfer knowled...
research
04/15/2022

CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation

Knowledge distillation (KD) is an efficient framework for compressing la...
research
06/29/2022

Teach me how to Interpolate a Myriad of Embeddings

Mixup refers to interpolation-based data augmentation, originally motiva...
research
07/03/2021

Isotonic Data Augmentation for Knowledge Distillation

Knowledge distillation uses both real hard labels and soft labels predic...
research
05/22/2023

Revisiting Data Augmentation in Model Compression: An Empirical and Comprehensive Study

The excellent performance of deep neural networks is usually accompanied...
research
06/09/2022

Extreme Masking for Learning Instance and Distributed Visual Representations

The paper presents a scalable approach for learning distributed represen...
research
02/28/2022

Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

Before entering the neural network, a token is generally converted to th...

Please sign up or login with your details

Forgot password? Click here to reset