A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness

06/30/2016
by   Nihar B. Shah, et al.
0

The aggregation and denoising of crowd labeled data is a task that has gained increased significance with the advent of crowdsourcing platforms and massive datasets. In this paper, we propose a permutation-based model for crowd labeled data that is a significant generalization of the common Dawid-Skene model, and introduce a new error metric by which to compare different estimators. Working in a high-dimensional non-asymptotic framework that allows both the number of workers and tasks to scale, we derive optimal rates of convergence for the permutation-based model. We show that the permutation-based model offers significant robustness in estimation due to its richness, while surprisingly incurring only a small additional statistical penalty as compared to the Dawid-Skene model. Finally, we propose a computationally-efficient method, called the OBI-WAN estimator, that is uniformly optimal over a class intermediate between the permutation-based and the Dawid-Skene models, and is uniformly consistent over the entire permutation-based model class. In contrast, the guarantees for estimators available in prior literature are sub-optimal over the original Dawid-Skene model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2022

Optimal Permutation Estimation in Crowd-Sourcing problems

Motivated by crowd-sourcing applications, we consider a model where we h...
research
09/01/2017

Low Permutation-rank Matrices: Structural Properties and Noisy Completion

We consider the problem of noisy matrix completion, in which the goal is...
research
08/21/2023

Label Selection Approach to Learning from Crowds

Supervised learning, especially supervised deep learning, requires large...
research
05/28/2021

Generalized Permutation Framework for Testing Model Variable Significance

A common problem in machine learning is determining if a variable signif...
research
02/07/2019

Evaluating Crowd Density Estimators via Their Uncertainty Bounds

In this work, we use the Belief Function Theory which extends the probab...
research
09/03/2021

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Scientists and practitioners increasingly rely on machine learning to mo...
research
09/10/2021

C-MinHash: Practically Reducing Two Permutations to Just One

Traditional minwise hashing (MinHash) requires applying K independent pe...

Please sign up or login with your details

Forgot password? Click here to reset