Utilizing supervised models to infer consensus labels and their quality from data with multiple annotators

10/13/2022
by   Hui Wen Goh, et al.
0

Real-world data for classification is often labeled by multiple annotators. For analyzing such data, we introduce CROWDLAB, a straightforward approach to estimate: (1) A consensus label for each example that aggregates the individual annotations (more accurately than aggregation via majority-vote or other algorithms used in crowdsourcing); (2) A confidence score for how likely each consensus label is correct (via well-calibrated estimates that account for the number of annotations for each example and their agreement, prediction-confidence from a trained classifier, and trustworthiness of each annotator vs. the classifier); (3) A rating for each annotator quantifying the overall correctness of their labels. While many algorithms have been proposed to estimate related quantities in crowdsourcing, these often rely on sophisticated generative models with iterative inference schemes, whereas CROWDLAB is based on simple weighted ensembling. Many algorithms also rely solely on annotator statistics, ignoring the features of the examples from which the annotations derive. CROWDLAB in contrast utilizes any classifier model trained on these features, which can generalize between examples with similar features. In evaluations on real-world multi-annotator image data, our proposed method provides superior estimates for (1)-(3) than many alternative algorithms.

READ FULL TEXT

page 8

page 12

page 14

page 16

page 19

research
01/27/2023

ActiveLab: Active Learning with Re-Labeling by Multiple Annotators

In real-world data labeling applications, annotators often provide imper...
research
09/06/2021

Rethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multi-Label Image Classification

Annotated images are required for both supervised model training and eva...
research
10/08/2012

Semisupervised Classifier Evaluation and Recalibration

How many labeled examples are needed to estimate a classifier's performa...
research
11/25/2022

Identifying Incorrect Annotations in Multi-Label Classification Data

In multi-label classification, each example in a dataset may be annotate...
research
11/19/2022

A Light-weight, Effective and Efficient Model for Label Aggregation in Crowdsourcing

Due to the noises in crowdsourced labels, label aggregation (LA) has eme...
research
10/08/2022

Detecting Label Errors in Token Classification Data

Mislabeled examples are a common issue in real-world data, particularly ...
research
06/18/2012

TrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing Multiple Ratings

This paper revisits the problem of analyzing multiple ratings given by d...

Please sign up or login with your details

Forgot password? Click here to reset