Bayesian Nonparametric Crowdsourcing

07/18/2014
by   Pablo G. Moreno, et al.
0

Crowdsourcing has been proven to be an effective and efficient tool to annotate large datasets. User annotations are often noisy, so methods to combine the annotations to produce reliable estimates of the ground truth are necessary. We claim that considering the existence of clusters of users in this combination step can improve the performance. This is especially important in early stages of crowdsourcing implementations, where the number of annotations is low. At this stage there is not enough information to accurately estimate the bias introduced by each annotator separately, so we have to resort to models that consider the statistical links among them. In addition, finding these clusters is interesting in itself as knowing the behavior of the pool of annotators allows implementing efficient active learning strategies. Based on this, we propose in this paper two new fully unsupervised models based on a Chinese Restaurant Process (CRP) prior and a hierarchical structure that allows inferring these groups jointly with the ground truth and the properties of the users. Efficient inference algorithms based on Gibbs sampling with auxiliary variables are proposed. Finally, we perform experiments, both on synthetic and real databases, to show the advantages of our models over state-of-the-art algorithms.

READ FULL TEXT
research
03/31/2021

CrowdTeacher: Robust Co-teaching with Noisy Answers Sample-specific Perturbations for Tabular Data

Samples with ground truth labels may not always be available in numerous...
research
12/20/2020

Bayesian Semi-supervised Crowdsourcing

Crowdsourcing has emerged as a powerful paradigm for efficiently labelin...
research
05/06/2020

Joint Multi-Dimensional Model for Global and Time-Series Annotations

Crowdsourcing is a popular approach to collect annotations for unlabeled...
research
12/29/2022

Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

Crowdsourcing has emerged as an effective platform to label a large volu...
research
11/13/2019

Streaming Bayesian Inference for Crowdsourced Classification

A key challenge in crowdsourcing is inferring the ground truth from nois...
research
03/12/2018

Leveraging Crowdsourcing Data For Deep Active Learning - An Application: Learning Intents in Alexa

This paper presents a generic Bayesian framework that enables any deep l...
research
11/16/2022

Can Strategic Data Collection Improve the Performance of Poverty Prediction Models?

Machine learning-based estimates of poverty and wealth are increasingly ...

Please sign up or login with your details

Forgot password? Click here to reset