ALM-KD: Knowledge Distillation with noisy labels via adaptive loss mixing

02/07/2022
by   Durga Sivasubramanian, et al.
0

Knowledge distillation is a technique where the outputs of a pretrained model, often known as the teacher model is used for training a student model in a supervised setting. The teacher model outputs being a richer distribution over labels should improve the student model's performance as opposed to training with the usual hard labels. However, the label distribution imposed by the logits of the teacher network may not be always informative and may lead to poor student performance. We tackle this problem via the use of an adaptive loss mixing scheme during KD. Specifically, our method learns an instance-specific convex combination of the teacher-matching and label supervision objectives, using meta learning on a validation metric signalling to the student `how much' of KD is to be used. Through a range of experiments on controlled synthetic data and real-world datasets, we demonstrate performance gains obtained using our approach in the standard KD setting as well as in multi-teacher and self-distillation settings.

READ FULL TEXT
research
01/11/2023

Synthetic data generation method for data-free knowledge distillation in regression neural networks

Knowledge distillation is the technique of compressing a larger neural n...
research
12/30/2021

Confidence-Aware Multi-Teacher Knowledge Distillation

Knowledge distillation is initially introduced to utilize additional sup...
research
01/30/2023

Understanding Self-Distillation in the Presence of Label Noise

Self-distillation (SD) is the process of first training a teacher model ...
research
06/26/2019

Essence Knowledge Distillation for Speech Recognition

It is well known that a speech recognition system that combines multiple...
research
04/21/2022

Multi-scale Knowledge Distillation for Unsupervised Person Re-Identification

Unsupervised person re-identification is a challenging and promising tas...
research
03/09/2023

Learn More for Food Recognition via Progressive Self-Distillation

Food recognition has a wide range of applications, such as health-aware ...
research
09/15/2020

Noisy Self-Knowledge Distillation for Text Summarization

In this paper we apply self-knowledge distillation to text summarization...

Please sign up or login with your details

Forgot password? Click here to reset