Parity Crowdsourcing for Cooperative Labeling

09/04/2018
by   Hye Won Chung, et al.
0

Consider a database of k objects, e.g., a set of videos, where each object has a binary attribute, e.g., a video's suitability for children. The attributes of the objects are to be determined under a crowdsourcing model: a worker is queried about the labels of a chosen subset of objects and either responds with a correct binary answer or declines to respond. Here we propose a parity response model: the worker is asked to check whether the number of objects having a given attribute in the chosen subset is even or odd. For example, if the subset includes two objects, the worker checks whether the two belong to the same class or not. We propose a method for designing the sequence of subsets of objects to be queried so that the attributes of the objects can be identified with high probability using few (n) answers. The method is based on an analogy to the design of Fountain codes for erasure channels. We define the query difficulty d̅ as the average size of the query subsets and we define the sample complexity n as the minimum number of collected answers required to attain a given recovery accuracy. We obtain fundamental tradeoffs between recovery accuracy, query difficulty, and sample complexity. In particular, the necessary and sufficient sample complexity required for recovering all k attributes with high probability is n = c_0 {k, (k k)/d̅} and the sample complexity for recovering a fixed proportion (1-δ)k of the attributes for δ=o(1) is n = c_1{k, (k (1/δ))/d̅}, where c_0, c_1>0.

READ FULL TEXT
research
12/01/2017

Fundamental Limits on Data Acquisition: Trade-offs between Sample Complexity and Query Difficulty

In this paper, we consider query-based data acquisition and the correspo...
research
01/31/2020

Crowdsourced Classification with XOR Queries: Fundamental Limits and An Efficient Algorithm

Crowdsourcing systems have emerged as an effective platform to label dat...
research
10/05/2016

Universal Clustering via Crowdsourcing

Consider unsupervised clustering of objects drawn from a discrete set, t...
research
08/23/2022

Convergence bounds for nonlinear least squares for tensor recovery

We consider the problem of approximating a function in general nonlinear...
research
02/18/2017

Sample complexity of population recovery

The problem of population recovery refers to estimating a distribution b...
research
08/14/2020

On the Sample Complexity of Super-Resolution Radar

We point out an issue with Lemma 8.6 of [1]. This lemma specifies the re...
research
06/15/2013

Outlying Property Detection with Numerical Attributes

The outlying property detection problem is the problem of discovering th...

Please sign up or login with your details

Forgot password? Click here to reset