Isotonic Data Augmentation for Knowledge Distillation

07/03/2021
by   Wanyun Cui, et al.
0

Knowledge distillation uses both real hard labels and soft labels predicted by teacher models as supervision. Intuitively, we expect the soft labels and hard labels to be concordant w.r.t. their orders of probabilities. However, we found critical order violations between hard labels and soft labels in augmented samples. For example, for an augmented sample x=0.7*panda+0.3*cat, we expect the order of meaningful soft labels to be P_soft(panda|x)>P_soft(cat|x)>P_soft(other|x). But real soft labels usually violate the order, e.g. P_soft(tiger|x)>P_soft(panda|x)>P_soft(cat|x). We attribute this to the unsatisfactory generalization ability of the teacher, which leads to the prediction error of augmented samples. Empirically, we found the violations are common and injure the knowledge transfer. In this paper, we introduce order restrictions to data augmentation for knowledge distillation, which is denoted as isotonic data augmentation (IDA). We use isotonic regression (IR) – a classic technique from statistics – to eliminate the order violations. We show that IDA can be modeled as a tree-structured IR problem. We thereby adapt the classical IRT-BIN algorithm for optimal solutions with O(c log c) time complexity, where c is the number of labels. In order to further reduce the time complexity, we also propose a GPU-friendly approximation with linear time complexity. We have verified on variant datasets and data augmentation techniques that our proposed IDA algorithms effectively increases the accuracy of knowledge distillation by eliminating the rank violations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2021

KDCTime: Knowledge Distillation with Calibration on InceptionTime for Time-series Classification

Time-series classification approaches based on deep neural networks are ...
research
10/20/2020

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Knowledge distillation is a strategy of training a student network with ...
research
05/21/2023

Understanding the Effect of Data Augmentation on Knowledge Distillation

Knowledge distillation (KD) requires sufficient data to transfer knowled...
research
06/06/2020

An Empirical Analysis of the Impact of Data Augmentation on Knowledge Distillation

Generalization Performance of Deep Learning models trained using the Emp...
research
10/21/2022

Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation

Knowledge distillation is one of the primary methods of transferring kno...
research
05/24/2023

HARD: Hard Augmentations for Robust Distillation

Knowledge distillation (KD) is a simple and successful method to transfe...
research
05/28/2021

Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax

In Natural Language Processing (NLP), finding data augmentation techniqu...

Please sign up or login with your details

Forgot password? Click here to reset