Understanding Self-Distillation in the Presence of Label Noise

01/30/2023
by   Rudrajit Das, et al.
0

Self-distillation (SD) is the process of first training a teacher model and then using its predictions to train a student model with the same architecture. Specifically, the student's objective function is (ξ*ℓ(teacher's predictions, student's predictions) + (1-ξ)*ℓ(given labels, student's predictions)), where ℓ is some loss function and ξ is some parameter ∈ [0,1]. Empirically, SD has been observed to provide performance gains in several settings. In this paper, we theoretically characterize the effect of SD in two supervised learning problems with noisy labels. We first analyze SD for regularized linear regression and show that in the high label noise regime, the optimal value of ξ that minimizes the expected error in estimating the ground truth parameter is surprisingly greater than 1. Empirically, we show that ξ > 1 works better than ξ≤ 1 even with the cross-entropy loss for several classification datasets when 50% or 30% of the labels are corrupted. Further, we quantify when optimal SD is better than optimal regularization. Next, we analyze SD in the case of logistic regression for binary classification with random label corruption and quantify the range of label corruption in which the student outperforms the teacher in terms of accuracy. To our knowledge, this is the first result of its kind for the cross-entropy loss.

READ FULL TEXT
research
11/21/2022

Blind Knowledge Distillation for Robust Image Classification

Optimizing neural networks with noisy labels is a challenging task, espe...
research
02/07/2022

ALM-KD: Knowledge Distillation with noisy labels via adaptive loss mixing

Knowledge distillation is a technique where the outputs of a pretrained ...
research
03/22/2022

Learning curves for the multi-class teacher-student perceptron

One of the most classical results in high-dimensional learning theory pr...
research
07/19/2020

Self-similarity Student for Partial Label Histopathology Image Segmentation

Delineation of cancerous regions in gigapixel whole slide images (WSIs) ...
research
06/05/2023

Deep Learning From Crowdsourced Labels: Coupled Cross-entropy Minimization, Identifiability, and Regularization

Using noisy crowdsourced labels from multiple annotators, a deep learnin...
research
08/07/2022

Preserving Fine-Grain Feature Information in Classification via Entropic Regularization

Labeling a classification dataset implies to define classes and associat...
research
06/09/2020

Self-Distillation as Instance-Specific Label Smoothing

It has been recently demonstrated that multi-generational self-distillat...

Please sign up or login with your details

Forgot password? Click here to reset