GPM: A Generic Probabilistic Model to Recover Annotator's Behavior and Ground Truth Labeling

03/01/2020
by   Jing Li, et al.
12

In the big data era, data labeling can be obtained through crowdsourcing. Nevertheless, the obtained labels are generally noisy, unreliable or even adversarial. In this paper, we propose a probabilistic graphical annotation model to infer the underlying ground truth and annotator's behavior. To accommodate both discrete and continuous application scenarios (e.g., classifying scenes vs. rating videos on a Likert scale), the underlying ground truth is considered following a distribution rather than a single value. In this way, the reliable but potentially divergent opinions from "good" annotators can be recovered. The proposed model is able to identify whether an annotator has worked diligently towards the task during the labeling procedure, which could be used for further selection of qualified annotators. Our model has been tested on both simulated data and real-world data, where it always shows superior performance than the other state-of-the-art models in terms of accuracy and robustness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2015

Regularized Minimax Conditional Entropy for Crowdsourcing

There is a rapidly increasing interest in crowdsourcing for data labelin...
research
07/31/2018

Inferring the ground truth through crowdsourcing

Universally valid ground truth is almost impossible to obtain or would c...
research
04/30/2013

Inferring ground truth from multi-annotator ordinal data: a probabilistic approach

A popular approach for large scale data annotation tasks is crowdsourcin...
research
11/13/2022

Ground Truth Inference for Weakly Supervised Entity Matching

Entity matching (EM) refers to the problem of identifying pairs of data ...
research
07/16/2018

Automatic generation of ground truth for the evaluation of obstacle detection and tracking techniques

As automated vehicles are getting closer to becoming a reality, it will ...
research
12/25/2020

Distributional Ground Truth: Non-Redundant Crowdsourcing Data Quality Control in UI Labeling Tasks

HCI increasingly employs Machine Learning and Image Recognition, in part...
research
10/28/2022

An Approach for Noisy, Crowdsourced Datasets Utilizing Ensemble Modeling, 'Human Softmax' Distributions, and Entropic Measures of Uncertainty

Noisy, crowdsourced image datasets prove challenging, even for the best ...

Please sign up or login with your details

Forgot password? Click here to reset