Doubly Robust Crowdsourcing

06/08/2019
by   Chong Liu, et al.
0

Large-scale labeled datasets are the indispensable fuel that ignites the AI revolution as we see today. Most such datasets are constructed using crowdsourcing services such as Amazon Mechanical Turk which provides noisy labels from non-experts at a fair price. The sheer size of such datasets mandates that it is only feasible to collect a few labels per data point. We formulate the problem of test-time label aggregation as a statistical estimation problem of inferring the expected voting score in an ideal world where all workers label all items. By imitating workers with supervised learners and using them in a doubly robust estimation framework, we prove that the variance of estimation can be substantially reduced, even if the learner is a poor approximation. Synthetic and real-world experiments show that by combining the doubly robust approach with adaptive worker/item selection, we often need as low as 0.1 labels per data point to achieve nearly the same accuracy as in the ideal world where all workers label all data points.

READ FULL TEXT
research
06/01/2016

A Minimax Optimal Algorithm for Crowdsourcing

We consider the problem of accurately estimating the reliability of work...
research
02/19/2015

Approval Voting and Incentives in Crowdsourcing

The growing need for labeled training data has made crowdsourcing an imp...
research
07/05/2022

Unsupervised Crowdsourcing with Accuracy and Cost Guarantees

We consider the problem of cost-optimal utilization of a crowdsourcing p...
research
10/25/2020

Exploiting Heterogeneous Graph Neural Networks with Latent Worker/Task Correlation Information for Label Aggregation in Crowdsourcing

Crowdsourcing has attracted much attention for its convenience to collec...
research
11/11/2021

Full Characterization of Adaptively Strong Majority Voting in Crowdsourcing

A commonly used technique for quality control in crowdsourcing is to tas...
research
12/10/2019

Practice of Efficient Data Collection via Crowdsourcing at Large-Scale

Modern machine learning algorithms need large datasets to be trained. Cr...
research
05/12/2023

Reduced Label Complexity For Tight ℓ_2 Regression

Given data X∈ℝ^n× d and labels 𝐲∈ℝ^n the goal is find 𝐰∈ℝ^d to minimize ...

Please sign up or login with your details

Forgot password? Click here to reset