Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

09/26/2019
by   Shahana Ibrahim, et al.
0

The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity. We show that the approach can identify the Dawid-Skene model under realistic conditions. We propose an algebraic algorithm reminiscent of convex geometry-based structured matrix factorization to solve the model identification problem efficiently, and an identifiability-enhanced algorithm for handling more challenging and critical scenarios. Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios.

READ FULL TEXT
research
06/30/2020

Recovering Joint Probability of Discrete Random Variables from Pairwise Marginals

Learning the joint probability of random variables (RVs) lies at the hea...
research
06/14/2021

Crowdsourcing via Annotator Co-occurrence Imputation and Provable Symmetric Nonnegative Matrix Factorization

Unsupervised learning of the Dawid-Skene (D S) model from noisy, incom...
research
02/15/2023

Enhanced Nonlinear System Identification by Interpolating Low-Rank Tensors

Function approximation from input and output data is one of the most inv...
research
05/17/2019

MiSC: Mixed Strategies Crowdsourcing

Popular crowdsourcing techniques mostly focus on evaluating workers' lab...
research
08/10/2020

Statistical Query Lower Bounds for Tensor PCA

In the Tensor PCA problem introduced by Richard and Montanari (2014), on...
research
11/16/2017

HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation

Recently, crowdsourcing has emerged as an effective paradigm for human-p...

Please sign up or login with your details

Forgot password? Click here to reset