Role-Wise Data Augmentation for Knowledge Distillation

04/19/2020
by   Jie Fu, et al.
4

Knowledge Distillation (KD) is a common method for transferring the “knowledge” learned by one machine learning model (the teacher) into another model (the student), where typically, the teacher has a greater capacity (e.g., more parameters or higher bit-widths). To our knowledge, existing methods overlook the fact that although the student absorbs extra knowledge from the teacher, both models share the same input data – and this data is the only medium by which the teacher's knowledge can be demonstrated. Due to the difference in model capacities, the student may not benefit fully from the same data points on which the teacher is trained. On the other hand, a human teacher may demonstrate a piece of knowledge with individualized examples adapted to a particular student, for instance, in terms of her cultural background and interests. Inspired by this behavior, we design data augmentation agents with distinct roles to facilitate knowledge distillation. Our data augmentation agents generate distinct training data for the teacher and student, respectively. We find empirically that specially tailored data points enable the teacher's knowledge to be demonstrated more effectively to the student. We compare our approach with existing KD methods on training popular neural architectures and demonstrate that role-wise data augmentation improves the effectiveness of KD over strong prior approaches. The code for reproducing our results can be found at https://github.com/bigaidream-projects/role-kd

READ FULL TEXT

page 12

page 13

research
10/09/2020

Local Region Knowledge Distillation

Knowledge distillation (KD) is an effective technique to transfer knowle...
research
05/24/2023

HARD: Hard Augmentations for Robust Distillation

Knowledge distillation (KD) is a simple and successful method to transfe...
research
12/06/2019

Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation

Sequence-level knowledge distillation (SLKD) is a model compression tech...
research
05/22/2023

Improving Robustness in Knowledge Distillation Using Domain-Targeted Data Augmentation

Applying knowledge distillation encourages a student model to behave mor...
research
05/16/2023

Lightweight Self-Knowledge Distillation with Multi-source Information Fusion

Knowledge Distillation (KD) is a powerful technique for transferring kno...
research
09/20/2022

Rethinking Data Augmentation in Knowledge Distillation for Object Detection

Knowledge distillation (KD) has shown its effectiveness for object detec...
research
12/11/2022

Learning What You Should Learn

In real teaching scenarios, an excellent teacher always teaches what he ...

Please sign up or login with your details

Forgot password? Click here to reset