HARD: Hard Augmentations for Robust Distillation

05/24/2023
by   Arne F. Nix, et al.
0

Knowledge distillation (KD) is a simple and successful method to transfer knowledge from a teacher to a student model solely based on functional activity. However, current KD has a few shortcomings: it has recently been shown that this method is unsuitable to transfer simple inductive biases like shift equivariance, struggles to transfer out of domain generalization, and optimization time is magnitudes longer compared to default non-KD model training. To improve these aspects of KD, we propose Hard Augmentations for Robust Distillation (HARD), a generally applicable data augmentation framework, that generates synthetic data points for which the teacher and the student disagree. We show in a simple toy example that our augmentation framework solves the problem of transferring simple equivariances with KD. We then apply our framework in real-world tasks for a variety of augmentation models, ranging from simple spatial transformations to unconstrained image manipulations with a pretrained variational autoencoder. We find that our learned augmentations significantly improve KD performance on in-domain and out-of-domain evaluation. Moreover, our method outperforms even state-of-the-art data augmentations and since the augmented training inputs can be visualized, they offer a qualitative insight into the properties that are transferred from the teacher to the student. Thus HARD represents a generally applicable, dynamically optimized data augmentation technique tailored to improve the generalization and convergence speed of models trained with KD.

READ FULL TEXT

page 5

page 7

page 9

research
04/19/2020

Role-Wise Data Augmentation for Knowledge Distillation

Knowledge Distillation (KD) is a common method for transferring the “kno...
research
06/06/2020

An Empirical Analysis of the Impact of Data Augmentation on Knowledge Distillation

Generalization Performance of Deep Learning models trained using the Emp...
research
05/22/2023

Improving Robustness in Knowledge Distillation Using Domain-Targeted Data Augmentation

Applying knowledge distillation encourages a student model to behave mor...
research
12/10/2021

DisCo: Effective Knowledge Distillation For Contrastive Learning of Sentence Embeddings

Contrastive learning has been proven suitable for learning sentence embe...
research
12/11/2022

Learning What You Should Learn

In real teaching scenarios, an excellent teacher always teaches what he ...
research
07/03/2021

Isotonic Data Augmentation for Knowledge Distillation

Knowledge distillation uses both real hard labels and soft labels predic...
research
05/31/2020

Transferring Inductive Biases through Knowledge Distillation

Having the right inductive biases can be crucial in many tasks or scenar...

Please sign up or login with your details

Forgot password? Click here to reset