Identifying noisy labels with a transductive semi-supervised leave-one-out filter

Obtaining data with meaningful labels is often costly and error-prone. In this situation, semi-supervised learning (SSL) approaches are interesting, as they leverage assumptions about the unlabeled data to make up for the limited amount of labels. However, in real-world situations, we cannot assume that the labeling process is infallible, and the accuracy of many SSL classifiers decreases significantly in the presence of label noise. In this work, we introduce the LGC_LVOF, a leave-one-out filtering approach based on the Local and Global Consistency (LGC) algorithm. Our method aims to detect and remove wrong labels, and thus can be used as a preprocessing step to any SSL classifier. Given the propagation matrix, detecting noisy labels takes O(cl) per step, with c the number of classes and l the number of labels. Moreover, one does not need to compute the whole propagation matrix, but only an l by l submatrix corresponding to interactions between labeled instances. As a result, our approach is best suited to datasets with a large amount of unlabeled data but not many labels. Results are provided for a number of datasets, including MNIST and ISOLET. LGCLVOF appears to be equally or more precise than the adapted gradient-based filter. We show that the best-case accuracy of the embedding of LGCLVOF into LGC yields performance comparable to the best-case of ℓ_1-based classifiers designed to be robust to label noise. We provide a heuristic to choose the number of removed instances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2020

Analysis of label noise in graph-based semi-supervised learning

In machine learning, one must acquire labels to help supervise a model t...
research
01/10/2022

Optimizing Diffusion Rate and Label Reliability in a Graph-Based Semi-supervised Classifier

Semi-supervised learning has received attention from researchers, as it ...
research
12/11/2019

Identifying Mislabeled Instances in Classification Datasets

A key requirement for supervised machine learning is labeled training da...
research
02/20/2019

Noisy multi-label semi-supervised dimensionality reduction

Noisy labeled data represent a rich source of information that often are...
research
11/17/2022

NorMatch: Matching Normalizing Flows with Discriminative Classifiers for Semi-Supervised Learning

Semi-Supervised Learning (SSL) aims to learn a model using a tiny labele...
research
06/27/2012

Semi-Supervised Collective Classification via Hybrid Label Regularization

Many classification problems involve data instances that are interlinked...
research
05/04/2022

Semi-Supervised Cascaded Clustering for Classification of Noisy Label Data

The performance of supervised classification techniques often deteriorat...

Please sign up or login with your details

Forgot password? Click here to reset