Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for Severe Label Noise

by   Fahimeh Fooladgar, et al.

Deep neural networks have proven to be highly effective when large amounts of data with clean labels are available. However, their performance degrades when training data contains noisy labels, leading to poor generalization on the test set. Real-world datasets contain noisy label samples that either have similar visual semantics to other classes (in-distribution) or have no semantic relevance to any class (out-of-distribution) in the dataset. Most state-of-the-art methods leverage ID labeled noisy samples as unlabeled data for semi-supervised learning, but OOD labeled noisy samples cannot be used in this way because they do not belong to any class within the dataset. Hence, in this paper, we propose incorporating the information from all the training data by leveraging the benefits of self-supervised training. Our method aims to extract a meaningful and generalizable embedding space for each sample regardless of its label. Then, we employ a simple yet effective K-nearest neighbor method to remove portions of out-of-distribution samples. By discarding these samples, we propose an iterative "Manifold DivideMix" algorithm to find clean and noisy samples, and train our model in a semi-supervised way. In addition, we propose "MixEMatch", a new algorithm for the semi-supervised step that involves mixup augmentation at the input and final hidden representations of the model. This will extract better representations by interpolating both in the input and manifold spaces. Extensive experiments on multiple synthetic-noise image benchmarks and real-world web-crawled datasets demonstrate the effectiveness of our proposed framework. Code is available at


page 1

page 2

page 3

page 4


DivideMix: Learning with Noisy Labels as Semi-supervised Learning

Deep neural networks are known to be annotation-hungry. Numerous efforts...

UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning

Supervised deep learning methods require a large repository of annotated...

ScanMix: Learning from Severe Label Noise via Semantic Clustering and Semi-Supervised Learning

In this paper, we address the problem of training deep neural networks i...

Class Prototype-based Cleaner for Label Noise Learning

Semi-supervised learning based methods are current SOTA solutions to the...

Ensemble Manifold Segmentation for Model Distillation and Semi-supervised Learning

Manifold theory has been the central concept of many learning methods. H...

Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets

Using search engines for web image retrieval is a tempting alternative t...

Differences Between Hard and Noisy-labeled Samples: An Empirical Study

Extracting noisy or incorrectly labeled samples from a labeled dataset w...

Please sign up or login with your details

Forgot password? Click here to reset