The rationale of crowdsourcing is to leverage the “wisdom of the crowd” in soliciting some kind of response or suggestion. Crowdsourcing allows for the annotation of large research corpuses for training data-intensive models such as deep neural networks. Platforms such as Mechanical Turk allow for the systematic and programmatic implementation and assignment of crowdsourcing tasks. The crowdsourcing estimation problem is especially difficult because both the reliabilities of users and the true answers to the questions are unknown.
1.1 Previous Work
Dawid and Skene studied this estimation problem in 1979 in the context of responses to surveys [4
]. In the Dawid-Skene stochastic model, each user has an associated reliability score dictating the probability with which the user answers a binary question correctly. The joint estimation of user reliabilities and correct answers in this model is very well studied in statistics, even for the case of
-ary questions. Popular approaches use Expectation Maximization to find the true labels maximizing the likelihood of estimated answer labels and reliabilities. Recent work uses spectral methods (based on eigenvalues of the assignment graph) to initialize the iterative expectation maximization algorithm, proving optimality convergence rates of such a scheme . Another approach achieving state-of-the-art performance finds the labeling via a minimax conditional entropy approach .
Most analysis of the estimation algorithms assume random sampling. However, conceivably, intermediate estimates of the reliabilities of workers can inform better worker-task assignments which lead to higher quality final estimates. A 2011 paper by Karger, Oh, and Shah on budget-optimal crowdsourcing tracks multiple instances of distinct assignment graphs but assumes a highly specific ”spammer-hammer” model where all users either answer randomly or correctly. Another approach models the assignment problem as a Markov Decision Process and derives the Optimistic Knowledge Gradient, which computes the conditional expectation of choosing certain workers, assuming a Beta-Bernoulli prior on the reliability of each worker . However the assignment scheme requires extensive recomputation and updating between each individual question-worker assignment, an unrealistic frequency of model updating.
Classical Dawid-Skene Model In the classical Dawid-Skene model, abbreviated “D-S”, there are users and questions with correct answers . (Without loss of generality we may map the answers to .) Let denote a bipartite matrix indicating if the user answered the question. This is typically called the assignment matrix. Additionally, let
be a vector which captures the correct answer for each of thequestions. Let be a vector denoting each user’s reliability, for users. Let be a stochastic answer matrix such that
2 Method for Quasi-Online Task Allocation
We derive an improved task allocation scheme and model the crowdsourcing problem as an optimization problem with an information-theoretic objective, since we want to assign workers to tasks to gain the most information about the true label of the question.
Previous work (unpublished) by the same authors (Cabreros, 2015) proposes a two-step estimation method that uses a budget parameter, , in two stages. Running a crowdsourcing estimation algorithm when half of the budget has been allocated yields estimates of the true answers, which may be used with a mixture model on the topics of questions to estimate reliabilities of each user, . Then sample users more likely to provide correct answers for specific questions during the remaining portion of the budget to arrive at a final A. The final answers matrix A is then used as input for the same black-box estimation algorithm to arrive at the final estimates, as depicted in Figure 1.
How do we decide which questions require more budget allocated? We consider information-theoretic metrics: mutual information sums over all possible outcomes of the random variables X, Y, and measures the mutual dependence between two variables by quantifying the “amount of information” obtained about one random variable by observing another(Cover, 2006). A variant of the mutual information between realized outcomes of the random variables is the pointwise mutual information
which has found applications in statistical natural language processing. We define another variant of mutual information,partial mutual information, between a random variable on the source channel, , and observed outcomes of , where we only integrate over the randomness of the unknown source channel. Partial Mutual Information, pMI(X; Y), is defined between random variable and outcomes of random variable .
In our notation:
However, solving the cardinality-constrained integer program to assign users to questions is most likely NP-hard, as (Krause, 2012) reduce a similar formulation of an entropy minimization problem to independent set. The best thing to do in a ‘one-shot’ setting if our remaining budget is less than the number of questions is to choose the best relative improvements in the partial mutual information, which we denote as for the improvement with respect to querying user . Estimating has already computed a “nested optimization” and found which user is the best to assign to a certain question.
We evaluate the error rates of three methods: (1) running Dawid-Skene estimation on a budget randomly sampled, and (2) running Dawid-Skene estimation on a budget randomly sampled and assigning the remaining budget via the one-shot allocation, and (3) method (2) but assigning budget via the dynamic task allocation. We evaluate the error rates from 20 random samples of questions and user reliabilities for 1000 users and 100 questions, assuming a mixture of 2 topics, over 10 different budgets from assigning of users to each question to coverage in Figure 2.
We also consider how the dimensions of the estimation problem impact the performance of these sampling policies in Figure 3: when users outnumber questions, the one-shot allocation does poorly while the new dynamic user allocation is still robust.
However, examining the intermediate accuracies yielded by the dynamic task allocation method indicate that the strengths of this method lie in the ability to achieve better estimation accuracy performance with fewer samples, and terminate the estimation process early. Can we use the estimates of mutual information in an optimal stopping framework to develop more data-efficient crowdsourcing estimation algorithms? “Efficiency” will be measured with regards to random sampling, where we want to show our method is efficient using samples as needed by random sampling, for any chosen . We will consider this question in the full paper and conjecture that one point of connection between the channel capacity is with the Fisher Information. Analysis would proceed by considering the minimax convergence rates of the D-S estimator, proved in (Gao, 2013), to analyze the disadvantage of using fewer samples.
We model task assignment in the crowdsourcing problem and develop a probabilistic model for a “partial mutual information” criterion which yields a one-step lookahead policy of which questions to ask next. We implement a batch estimation policy which exhausts the budget by querying an additional label for each question, re-estimating
, and using these updated estimates to re-compute the label estimates. We are able to show significant improvement in estimation accuracy for small budgets. Our dynamic task allocation scheme is also robust for estimation schemes with higher ratios of questions to users, where the one-shot learning policy suffers higher error than random sampling. The advantage of such a scheme is that the mutual information heuristic we develop may be extended to evaluate early termination of the estimation process.
- Cabreros (2015) Cabreros, Irineo; Singh, Karan; Zhou Angela. Mixture model for crowdsourcing. 2015.
Chen, Xi; Lin, QIhang; Zhou Dengyong.
Optimistic knowledge gradient policy for optimal allocation in
Journal of Machine Learning Research, 2013.
- Cover (2006) Cover, Thomas; Thomas, Joy. Elements of Information Theory. Wiley, 2006.
- Dawid (1979) Dawid, A.P; Skene, A.M. Maximum likelihood estimation of observer error-rates using the em algorithm. Journal of the Royal Statistical Society, 1979.
- Gao (2013) Gao, Chao; Zhou, Dengyong. Minimax optimal convergence rates for estimating ground truth from crowdsourced labels. MSR Technical Report, 2013.
- Krause (2012) Krause, Andreas; Golovin, Daniel. Submodular function maximization. 2012.
- Zhang (2014) Zhang, Yuchen; Chen, Xi; Zhou Dengyong; Jordan Michael. Spectral methods meet em: A provably optimal algorithm for crowdsourcing. NIPS, 2014.
- Zhou (2014) Zhou, D; Liu, Q; Platt J.C; Meek C. Aggregating ordinal labels from crowds by minimax conditional entropy. Proceedings of ICML, 2014.