SLaM: Student-Label Mixing for Semi-Supervised Knowledge Distillation

02/08/2023
by   Vasilis Kontonis, et al.
1

Semi-supervised knowledge distillation is a powerful training paradigm for generating compact and lightweight student models in settings where the amount of labeled data is limited but one has access to a large pool of unlabeled data. The idea is that a large teacher model is utilized to generate “smoothed” pseudo-labels for the unlabeled dataset which are then used for training the student model. Despite its success in a wide variety of applications, a shortcoming of this approach is that the teacher's pseudo-labels are often noisy, leading to impaired student performance. In this paper, we present a principled method for semi-supervised knowledge distillation that we call Student-Label Mixing (SLaM) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks. Finally, we show that SLaM comes with theoretical guarantees; along the way we give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise, and provide the first convergence analysis for so-called “forward loss-adjustment" methods.

READ FULL TEXT
research
04/01/2022

Unified and Effective Ensemble Knowledge Distillation

Ensemble knowledge distillation can extract knowledge from multiple teac...
research
10/03/2022

Robust Active Distillation

Distilling knowledge from a large teacher model to a lightweight one is ...
research
10/13/2022

Weighted Distillation with Unlabeled Examples

Distillation with unlabeled examples is a popular and powerful method fo...
research
07/20/2023

Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering

Despite the empirical success and practical significance of (relational)...
research
07/28/2023

CLIP Brings Better Features to Visual Aesthetics Learners

The success of pre-training approaches on a variety of downstream tasks ...
research
03/26/2021

Multimodal Knowledge Expansion

The popularity of multimodal sensors and the accessibility of the Intern...
research
07/21/2020

Deep Semi-supervised Knowledge Distillation for Overlapping Cervical Cell Instance Segmentation

Deep learning methods show promising results for overlapping cervical ce...

Please sign up or login with your details

Forgot password? Click here to reset