Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study

11/08/2022
by   Hongjun Choi, et al.
0

Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the other hand, is widely used for model compression and transfer learning, which involves using a larger network's implicit knowledge to guide the learning of a smaller network. At first glance, these two techniques seem very different, however, we found that "smoothness" is the connecting link between the two and is also a crucial attribute in understanding KD's interplay with mixup. Although many mixup variants and distillation methods have been proposed, much remains to be understood regarding the role of a mixup in knowledge distillation. In this paper, we present a detailed empirical study on various important dimensions of compatibility between mixup and knowledge distillation. We also scrutinize the behavior of the networks trained with a mixup in the light of knowledge distillation through extensive analysis, visualizations, and comprehensive experiments on image classification. Finally, based on our findings, we suggest improved strategies to guide the student network to enhance its effectiveness. Additionally, the findings of this study provide insightful suggestions to researchers and practitioners that commonly use techniques from KD. Our code is available at https://github.com/hchoi71/MIX-KD.

READ FULL TEXT

page 5

page 8

research
05/31/2022

What Knowledge Gets Distilled in Knowledge Distillation?

Knowledge distillation aims to transfer useful information from a teache...
research
10/27/2019

MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization

In this paper, we present and discuss a deep mixture model with online k...
research
04/25/2023

Class Attention Transfer Based Knowledge Distillation

Previous knowledge distillation methods have shown their impressive perf...
research
09/13/2021

How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

Knowledge Distillation (KD) is a model compression algorithm that helps ...
research
11/25/2020

torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation

While knowledge distillation (transfer) has been attracting attentions f...
research
04/17/2018

Neural Compatibility Modeling with Attentive Knowledge Distillation

Recently, the booming fashion sector and its huge potential benefits hav...
research
05/10/2021

KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

Knowledge distillation (KD) has recently emerged as an efficacious schem...

Please sign up or login with your details

Forgot password? Click here to reset