Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets

07/04/2022
by   Paul Albert, et al.
0

Using search engines for web image retrieval is a tempting alternative to manual curation when creating an image dataset, but their main drawback remains the proportion of incorrect (noisy) samples retrieved. These noisy samples have been evidenced by previous works to be a mixture of in-distribution (ID) samples, assigned to the incorrect category but presenting similar visual semantics to other classes in the dataset, and out-of-distribution (OOD) images, which share no semantic correlation with any category from the dataset. The latter are, in practice, the dominant type of noisy images retrieved. To tackle this noise duality, we propose a two stage algorithm starting with a detection step where we use unsupervised contrastive feature learning to represent images in a feature space. We find that the alignment and uniformity principles of contrastive learning allow OOD samples to be linearly separated from ID samples on the unit hypersphere. We then spectrally embed the unsupervised representations using a fixed neighborhood size and apply an outlier sensitive clustering at the class level to detect the clean and OOD clusters as well as ID noisy outliers. We finally train a noise robust neural network that corrects ID noise to the correct category and utilizes OOD samples in a guided contrastive objective, clustering them to improve low-level features. Our algorithm improves the state-of-the-art results on synthetic noise image datasets as well as real-world web-crawled data. Our work is fully reproducible [github].

READ FULL TEXT
research
08/13/2023

Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for Severe Label Noise

Deep neural networks have proven to be highly effective when large amoun...
research
06/15/2021

Cluster-guided Asymmetric Contrastive Learning for Unsupervised Person Re-Identification

Unsupervised person re-identification (Re-ID) aims to match pedestrian i...
research
10/10/2022

Is your noise correction noisy? PLS: Robustness to label noise with two stage detection

Designing robust algorithms capable of training accurate neural networks...
research
01/31/2023

NoiseTransfer: Image Noise Generation with Contrastive Embeddings

Deep image denoising networks have achieved impressive success with the ...
research
03/14/2023

WDiscOOD: Out-of-Distribution Detection via Whitened Linear Discriminative Analysis

Deep neural networks are susceptible to generating overconfident yet err...
research
02/06/2023

Cluster-aware Contrastive Learning for Unsupervised Out-of-distribution Detection

Unsupervised out-of-distribution (OOD) Detection aims to separate the sa...
research
06/01/2023

Estimating Semantic Similarity between In-Domain and Out-of-Domain Samples

Prior work typically describes out-of-domain (OOD) or out-of-distributio...

Please sign up or login with your details

Forgot password? Click here to reset