Not All Knowledge Is Created Equal

by   Ziyun Li, et al.

Mutual knowledge distillation (MKD) improves a model by distilling knowledge from another model. However, not all knowledge is certain and correct, especially under adverse conditions. For example, label noise usually leads to less reliable models due to the undesired memorisation [1, 2]. Wrong knowledge misleads the learning rather than helps. This problem can be handled by two aspects: (i) improving the reliability of a model where the knowledge is from (i.e., knowledge source's reliability); (ii) selecting reliable knowledge for distillation. In the literature, making a model more reliable is widely studied while selective MKD receives little attention. Therefore, we focus on studying selective MKD and highlight its importance in this work. Concretely, a generic MKD framework, Confident knowledge selection followed by Mutual Distillation (CMD), is designed. The key component of CMD is a generic knowledge selection formulation, making the selection threshold either static (CMD-S) or progressive (CMD-P). Additionally, CMD covers two special cases: zero knowledge and all knowledge, leading to a unified MKD framework. We empirically find CMD-P performs better than CMD-S. The main reason is that a model's knowledge upgrades and becomes confident as the training progresses. Extensive experiments are present to demonstrate the effectiveness of CMD and thoroughly justify the design of CMD. For example, CMD-P obtains new state-of-the-art results in robustness against label noise.


page 1

page 2

page 3

page 4


A Selective Survey on Versatile Knowledge Distillation Paradigm for Neural Network Models

This paper aims to provide a selective survey about knowledge distillati...

DGD: Densifying the Knowledge of Neural Networks with Filter Grafting and Knowledge Distillation

With a fixed model structure, knowledge distillation and filter grafting...

Circumventing Outliers of AutoAugment with Knowledge Distillation

AutoAugment has been a powerful algorithm that improves the accuracy of ...

Using Knowledge Distillation to improve interpretable models in a retail banking context

This article sets forth a review of knowledge distillation techniques wi...

Explicit and Implicit Knowledge Distillation via Unlabeled Data

Data-free knowledge distillation is a challenging model lightweight task...

How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

Knowledge Distillation (KD) is a model compression algorithm that helps ...

Mechanisms for producing a working knowledge: Enacting, orchestrating and organizing

Given that knowledge (intensive) work takes place immersed in truly hete...

Please sign up or login with your details

Forgot password? Click here to reset