Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

by   Hyeonsu Jeong, et al.

Crowdsourcing has emerged as an effective platform to label a large volume of data in a cost- and time-efficient manner. Most previous works have focused on designing an efficient algorithm to recover only the ground-truth labels of the data. In this paper, we consider multi-choice crowdsourced labeling with the goal of recovering not only the ground truth but also the most confusing answer and the confusion probability. The most confusing answer provides useful information about the task by revealing the most plausible answer other than the ground truth and how plausible it is. To theoretically analyze such scenarios, we propose a model where there are top-two plausible answers for each task, distinguished from the rest of choices. Task difficulty is quantified by the confusion probability between the top two, and worker reliability is quantified by the probability of giving an answer among the top two. Under this model, we propose a two-stage inference algorithm to infer the top-two answers as well as the confusion probability. We show that our algorithm achieves the minimax optimal convergence rate. We conduct both synthetic and real-data experiments and demonstrate that our algorithm outperforms other recent algorithms. We also show the applicability of our algorithms in inferring the difficulty of tasks and training neural networks with the soft labels composed of the top-two most plausible classes.


page 16

page 17


A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental Limits

Crowdsourcing system has emerged as an effective platform to label data ...

Regularized Minimax Conditional Entropy for Crowdsourcing

There is a rapidly increasing interest in crowdsourcing for data labelin...

A Provably Improved Algorithm for Crowdsourcing with Hard and Easy Tasks

Crowdsourcing is a popular method used to estimate ground-truth labels b...

Treating Crowdsourcing as Examination: How to Score Tasks and Online Workers?

Crowdsourcing is an online outsourcing mode which can solve the current ...

Bayesian Nonparametric Crowdsourcing

Crowdsourcing has been proven to be an effective and efficient tool to a...

CrowdTeacher: Robust Co-teaching with Noisy Answers Sample-specific Perturbations for Tabular Data

Samples with ground truth labels may not always be available in numerous...

Tractogram filtering of anatomically non-plausible fibers with geometric deep learning

Tractograms are virtual representations of the white matter fibers of th...

Please sign up or login with your details

Forgot password? Click here to reset