Harmless label noise and informative soft-labels in supervised classification

04/07/2021
by   Daniel Ahfock, et al.
11

Manual labelling of training examples is common practice in supervised learning. When the labelling task is of non-trivial difficulty, the supplied labels may not be equal to the ground-truth labels, and label noise is introduced into the training dataset. If the manual annotation is carried out by multiple experts, the same training example can be given different class assignments by different experts, which is indicative of label noise. In the framework of model-based classification, a simple, but key observation is that when the manual labels are sampled using the posterior probabilities of class membership, the noisy labels are as valuable as the ground-truth labels in terms of statistical information. A relaxation of this process is a random effects model for imperfect labelling by a group that uses approximate posterior probabilities of class membership. The relative efficiency of logistic regression using the noisy labels compared to logistic regression using the ground-truth labels can then be derived. The main finding is that logistic regression can be robust to label noise when label noise and classification difficulty are positively correlated. In particular, when classification difficulty is the only source of label errors, multiple sets of noisy labels can supply more information for the estimation of a classification rule compared to the single set of ground-truth labels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2022

A Comparison of Automatic Labelling Approaches for Sentiment Analysis

Labelling a large quantity of social media data for the task of supervis...
research
07/09/2021

Batch Inverse-Variance Weighting: Deep Heteroscedastic Regression

Heteroscedastic regression is the task of supervised learning where each...
research
12/09/2014

Cancer Detection with Multiple Radiologists via Soft Multiple Instance Logistic Regression and L_1 Regularization

This paper deals with the multiple annotation problem in medical applica...
research
10/02/2020

Deep Learning for Earth Image Segmentation based on Imperfect Polyline Labels with Annotation Errors

In recent years, deep learning techniques (e.g., U-Net, DeepLab) have ac...
research
07/29/2020

Difficulty-aware Glaucoma Classification with Multi-Rater Consensus Modeling

Medical images are generally labeled by multiple experts before the fina...
research
05/10/2018

Labelling as an unsupervised learning problem

Unravelling hidden patterns in datasets is a classical problem with many...

Please sign up or login with your details

Forgot password? Click here to reset