Crowdsourcing with Meta-Workers: A New Way to Save the Budget

Due to the unreliability of Internet workers, it's difficult to complete a crowdsourcing project satisfactorily, especially when the tasks are multiple and the budget is limited. Recently, meta learning has brought new vitality to few-shot learning, making it possible to obtain a classifier with a fair performance using only a few training samples. Here we introduce the concept of meta-worker, a machine annotator trained by meta learning for types of tasks (i.e., image classification) that are well-fit for AI. Unlike regular crowd workers, meta-workers can be reliable, stable, and more importantly, tireless and free. We first cluster unlabeled data and ask crowd workers to repeatedly annotate the instances nearby the cluster centers; we then leverage the annotated data and meta-training datasets to build a cluster of meta-workers using different meta learning algorithms. Subsequently, meta-workers are asked to annotate the remaining crowdsourced tasks. The Jensen-Shannon divergence is used to measure the disagreement among the annotations provided by the meta-workers, which determines whether or not crowd workers should be invited for further annotation of the same task. Finally, we model meta-workers' preferences and compute the consensus annotation by weighted majority voting. Our empirical study confirms that, by combining machine and human intelligence, we can accomplish a crowdsourcing project with a lower budget than state-of-the-art task assignment methods, while achieving a superior or comparable quality.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

page 10

01/11/2019

BUOCA: Budget-Optimized Crowd Worker Allocation

Due to concerns about human error in crowdsourcing, it is standard pract...
01/17/2019

Beyond monetary incentives: experiments in paid microtask contests modelled as continuous-time markov chains

In this paper, we aim to gain a better understanding into how paid micro...
11/07/2019

Active Multi-Label Crowd Consensus

Crowdsourcing is an economic and efficient strategy aimed at collecting ...
11/07/2021

Open-Set Crowdsourcing using Multiple-Source Transfer Learning

We raise and define a new crowdsourcing scenario, open set crowdsourcing...
03/12/2014

Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling

In crowd labeling, a large amount of unlabeled data instances are outsou...
10/26/2017

Optimal Crowdsourced Classification with a Reject Option in the Presence of Spammers

We explore the design of an effective crowdsourcing system for an M-ary ...
05/23/2021

Wisdom for the Crowd: Discoursive Power in Annotation Instructions for Computer Vision

Developers of computer vision algorithms outsource some of the labor inv...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Crowdsourcing used to refer to the practice of an organization of outsourcing tasks, otherwise performed by employees, to a large number of volunteers [9]. Recently, crowdsourcing has become the only viable way to annotate massive data through the hiring of a large number of inexpensive Internet workers [23]

. Although a variety of tasks can be crowdsourced, the relatively common one is the annotation of images (i.e., ImageNet) for data-driven machine learning algorithms such as deep learning. However, due to the difficulty of the tasks, the poor task description, and the diverse capacity ranges of workers and so on

[14, 3], we often need to invite multiple workers to annotate the same data to improve the label quality [22]. This limits the use of crowdsourcing when the available budget is limited. As such, we face the need of obtaining quality data with a tight budget. Proposed solutions focus on modeling crowdsourced tasks [46, 1], workers [35], or crowdsourcing processes [48, 16, 28] to achieve a better understanding of the same, thereby reducing the impact of incompetent worker and the number of repeated annotations, while improving the quality.

We argue that meta learning can offer a solution to the challenge we are facing with crowdsourcing [30]

. Meta learning imitates the process of learning in humans. With only a few data in the target domain, the learner can quickly adapt to recognize a new class of objects. For example, state-of-the-art meta learning algorithms can achieve an accuracy of nearly 60% on five classification tasks for the Mini-ImageNet dataset, with only one training instance per class (called 5-way 1-shot few-shot learning)

[25]. This is comparable to the capability of the majority of human workers on real world crowdsourcing platforms[12]. As such, we can model crowdsourcing employees as meta learners: with a small amount of guidance, they can quickly learn new skills to accomplish crowdsourced tasks.

To make this idea concrete, we introduce the notion of meta-worker, a virtual worker trained via a meta learning algorithm, which can quickly generalize to new tasks. Specifically, our crowdsourcing process is formulated as follows. Given a crowdsourcing project, we first partition the tasks into different clusters. We then collect a batch of tasks close to each cluster center and ask the crowd workers to annotate them until a ‘-way -shot’ meta-test dataset is obtained. We also build our meta-training datasets by collecting data from the Internet. Different meta learning algorithms are used to generate a group of diverse meta-workers. We then employ the meta-workers to annotate the remaining tasks; we measure the disagreement among the annotations using the Jensen-Shannon divergence, and consequently decide whether or not to further invite crowd workers to provide additional annotations. Finally, we model the meta-workers’ preference and use weighted majority voting (WMV) to compute the consensus labels, and iteratively optimize the latter until convergence is reached.

The main contributions of our work are as follows:
(i) This work is the first effort in the literature to directly supplement human workers with machine classifiers for crowdsourcing. The results indicate that machine intelligence can limit the use of crowd workers and achieve quality control.
(ii) We use meta-learning to train meta-workers. In addition, we employ ensemble learning to boost the meta-workers’ ability of producing reliable labels. Most simple tasks do not require the participation of human workers, thus enabling budget savings.
(iii) Experiments on real datasets prove that our method achieves the highest quality while using comparable or far less budget than state-of-the-art methods, and the amount of budget-saving grows as the scale of tasks increases.

2 Related Work

Our work is mainly related to two research areas, crowdsourcing and meta learning. Meta learning (or learning to learn) is inspired by the ability of humans to use previous experience to quickly learn new skills [30, 17]. The meta learning paradigm trains a model using a large amount of data in different source domains where data are available, and then fine tunes the model using a small number of samples of the target domain.

In recent years, a variety of meta learning approaches have been introduced. Few-shot meta learning methods can be roughly grouped into three categories, namely optimization-based, model-based (or memory-based, black box model), and metric-based (or non-parametric) methods. Optimization-based methods treat meta learning as an optimization problem and extract the meta-knowledge required to improve the optimization performance. For example, Model-Agnostic Meta-Learning [6]

looks for a set of initialization values of the model parameters that can lead to a strong generalization capability. The idea is to enable the model to quickly adapt to new tasks using few training instances. Model-based methods train a neural network to predict the output based on the model’s current state (determined by the training set) and input data. The meta-knowledge extraction and meta learning process are wrapped in the model training process. Memory-Augmented Neural Networks

[21] and Meta Networks [20] are representative model-based methods. Metric-based methods use clustering ideas for classification. They perform non-parametric learning at the inner (task) level by simply comparing validation points with training points, and assign a validation point to the category with the closest training point. This approach has several representative methods: Siamese networks [2], prototypical networks [24] and relation networks [25]. Here the meta-knowledge is given by the distance or similarity metric.

Few-shot meta learning can leverage multi-source data to better mine the information of the target domain data, which is consistent with the aim of budget saving in crowdsourcing. A lot of effort has been put to achieve budget savings in crowdsourcing [15]. One way is to reduce the number of tasks to be performed, such as task pruning (prune the tasks that machines can do well) [33], answer deduction (prune the tasks whose answers can be deduced from existing crowdsourced tasks) [34], and task selection (select the most beneficial tasks to crowdsource) [19, 39, 29, 40]. Another approach is to reduce the cost of individual tasks, or dynamically determine the price of single task, mainly by better task (flow) design [18, 45, 27, 26].

Our work is also related to semi-supervised learning. Semi-supervised self-training

[38, 37]

gradually augments the labeled data with new instances, whose labels have been inferred with high confidence, until the unlabeled data pool is empty or the learner does not improve any further. The effectiveness of this approach depends on the added value of the augmented labeled data. Furthermore, the model needs to be updated every time when an instance is augmented as labeled, which is not feasible in a large crowdsourcing project. Some active learning based crowdsourcing approaches

[5, 13, 40] also suffer from these issues. [42] trained a group of classifiers using cleaned data in crowdsourcing, the classifiers are then used to correct the potential noisy labels. Unlike the above proposed solutions, our approach is feasible because we directly use meta-workers, trained by meta learning, to annotate unlabeled data. Meta-workers can quickly generalize to new tasks and can achieve a good performance with the support of only few instances. In contrast, existing methods canonically depend on sufficient training instances for each category to enable machine intelligence assisted crowdsourcing [13, 37, 42, 43]. In addition, our model learns meta-knowledge from external free data to save the budget considerably. Due to the cooperation among diverse meta-workers and to ensemble learning, we can further boost the performance of a group of meta-workers, without the need of frequent updates.

3 Proposed Methodology

3.1 Definitions

In this section, we will formalize the Crowdsourcing with Meta-Workers problem setup in detail. Meta model usually consists of two learners, the upper one is called ‘meta learner’, with the duty of extracting meta-knowledge to guide the optimization of the bottom learner, and the bottom one is called ‘base learner’, which executes the classification job. In order to achieve this, the model is firstly trained on a group of different machine learning tasks (like Multi-task Learning, MTL [44]) named “meta-training set”, expecting the model to be capable for different tasks, then the model moves to its target domain called “meta-test set”. More precisely, the meta learner takes one classical machine learning task as a meta-training instance, and a group of tasks as meta-training set, extracts meta-knowledge from them, then uses these meta-knowledge to guide the training and generalization of the base learner on the target domain. To eliminate ambiguity, following the general naming rules of meta learning, we use ‘train/test’ to distinguish the instance (classic machine learning task) used by meta learner, and ‘support/query’ to distinguish the instance (instance in classical machine learning) used by base learner, the composition of dataset required for meta learning is shown in Fig. 1.

Fig. 1: An example of 5-way 1-shot meta learning dataset. From the perspective of meta learner, data can be divided into meta-training set and meta-test set . The former is composed of datasets of many independent tasks , and the latter can be separated as support set and query set . The label space size is called n-way, and the number of instance per label in is called k-shot.

Let be a crowdsourcing project with tasks, where each task belongs to one out of classes. We cluster the tasks into categories, and select instances from each cluster to be annotated by crowd workers. The resulting annotated tasks (-support set,

-query set) are used to estimate the crowd workers’ capacity.

and the remaining tasks form our meta-test dataset: . We also need to collect auxiliary datasets related to the tasks at hand to build our meta-training dataset , where each is an independent machine learning task dataset. The diversity of guarantees the generalization ability. In the few-shot learning paradigm, this setup is called a ‘-way -shot’ problem. Table I summarizes the notations used in this paper.

Item Symbol Remarks
number of crowd task total tasks
task / instance total tasks

task labels’ vector

size , value [1, 2, , ]
label space size called -way
support instance num called -shot
meta-test task type support / query
worker (set) total meta, crowd
worker type meta / crowd
confusion matrix size , model meta
accuracy / capacity decimal, model crowd
worker’s annotations size
annotation size
meta algorithms (set) total algorithms
divergence threshold difficulty criteria
TABLE I: List of symbols.

3.2 Workflow

The workflow of our approach is shown in Fig. 2. After obtaining (completely labeled) and (partially labeled), the problem becomes a standard ‘-way -shot’ meta learning problem. We apply a meta learning algorithm on to extract the meta-knowledge . We then combine with the meta-test support set to adjust to the target task domain, obtaining a meta-worker (i.e., a classifier) . By using different meta learning algorithms in , we can obtain a group of meta-workers with different preferences. are used to get the annotation matrices of the remaining tasks . If the meta-workers disagree with one another on a task, we invite crowd workers to provide further annotations for that task. Finally, we use the confusion matrix and the accuracy to separately model the preferences of meta-workers and of crowd workers, and compute the consensus labels by weighted majority voting in an iterative manner.

Fig. 2: Workflow of our approach. Four steps are represented using different colors. The meta-test set (dash box) is built from the project via clustering and crowdsourcing. , together with the meta-knowledge , generates meta-workers ; the latter annotates to attain the matrices . If the meta-workers disagree on task (i.e. ), crowd workers will be asked to provide annotations (red dash line). Finally, we aggregate to acquire the labels.

3.3 Building the Meta-test Set

The first step of our method transforms a crowdsourcing project of tasks into an -way -shot meta-test set . In order to build an -way -shot meta-test support set from unlabeled instances, we use -means to cluster the instances into clusters (other clustering algorithms can be used as well). We then select instances closest to each cluster center to be annotated. Since the results of clustering are not perfect, the selected instances might not belong to the assigned cluster. Therefore, building requires slightly more than instances.

The label quality of the meta-test support set is crucial. As such, we ask as many as possible workers () to provide annotations. A basic assumption in crowdsourcing is that the aggregated annotations given by a large number of workers are reliable. For example, with an average accuracy of crowd workers as 0.6 under 5 classes, even if the simplest majority voting is used, the expected accuracy of 10 repeated annotations from 10 crowd workers is about 95%, and of 30 repeated annotations is above 99.95%. Once is attained, the remaining tasks of give . , combined with , constitutes the meta-test set.

Although the ground truth of each task in crowdsourcing is unknown and it is hard to estimate a worker s quality, we can still approximate the ground truth for a small portion of tasks (called golden tasks) and use them to estimate worker’s quality [47, 29]. In this way, we can pre-identify low-quality workers based on , and prevent them from participating into the subsequent crowdsourcing process. The modeling of crowd worker’s quality will be discussed in detail in Section 3.6.

3.4 Training Meta-workers

We need to build meta-workers using meta learning algorithms. We choose one representative method from each of the three meta learning categories to form our meta-worker cluster, namely Model-Augmented Meta-Learning (MAML) [6], Meta Networks (MN) [20], and Relation Networks (RN) [25], with a -way -shot setting. Each induces a different learning bias, thus together they can lead to effective ensembles.

Meta learning trains a model on a variety of learning tasks. The model is then fine-tuned to solve new learning tasks using only a small number of training samples of the target task domain [30]. The general meta-learning algorithm can be formalized as follows:

(1)
(2)

where represents the model parameters we want to learn, and is the meta-knowledge extracted from . Eq. (1) corresponds to the meta-knowledge learning phase and Eq. (2) comes to the meta-knowledge adaptation phase.

For the above purpose, MAML optimizes the initial values of the parameters to enable the model to adapt quickly to new tasks during gradient descent. The meta-knowledge is given by the gradient descent direction. MN has the ability to remember old data and to assimilate new data quickly. MN learns meta-level knowledge across tasks and shifts its inductive biases along the direction of error reduction. RN learns a deep distance metric () to compare a small number of instances within episodes.

3.5 Obtaining Annotations

After the meta-workers have been adjusted to the target task domain, they can be used to replace, or work together with, the crowd workers to provide the annotations and for , and to save the budget. Although we consider multiple meta-workers to improve the quality of crowdsourcing, there may exist difficult tasks that cannot be performed well by meta-workers. Therefore, we model the difficulty of all the tasks, and select the difficult ones to be annotated by crowd workers.

There are many criteria to quantify the difficulty of a task [15, 29, 40]

. Here, we adopt a simple and intuitive criterion: the more difficult a task is, the harder is for meta-workers to reach an agreement on it, and the larger the divergence between the task annotations is. As such, we can approximate the difficulty of a task by measuring the annotation divergence. Since the annotation given by meta-workers is a label probability distribution, the KL divergence (Eq. (

3)) can be used to measure the difference between any two distributions:

(3)

where and

are discrete probability distributions.

However, the direct use of the KL divergence has two disadvantages, i.e., the KL divergence is asymmetric and it can only calculate the divergence between two annotations. Asymmetry makes it necessary to consider the order between annotations, which makes it more complicated and tedious to measure the divergence of multiple annotations. For those reasons, we use the symmetric Jensen-Shannon divergence (Eq. (4)), an extension of the KL divergence, to measure the divergence of all possible annotation pairs, and take their average value as the final divergence (Eq. (5)):

(4)
(5)

where represents the set of meta annotations for a task.

Once we have collected the meta annotations , we calculate the JS divergence of each task, pick out the tasks with divergence greater than , and submit them to crowd workers for further annotation. We assign additional crowd workers with fair quality to the difficult tasks each, and obtain the crowd annotations . Finally, all annotations are gathered to compute the consensus labels.

3.6 Aggregating the Annotations

3.6.1 Correcting Annotations

During the last step we compute the consensus labels of the tasks. Meta annotations and crowd annotations are inherently different: the former is a discrete probability distribution in the label space, while the crowd annotation is a typical one-hot coding in the label space. Therefore, we use different strategies to model meta-workers and crowd workers. The meta-workers’ probability distribution annotation gives the probabilities that the instance belongs to each class, which are suitable for a D&S model [4, 36, 10]. The one-hot coding crowd annotations, instead, simply indicate the chosen most likely label for an instance. Furthermore, the number of crowd annotations is smaller than that of meta annotations. As such, we can’t build a complex model for crowd workers based on their annotations, so we choose the simplest but effective Worker Probability model (or capacity, accuracy etc.) [8, 11, 41] for modeling in this case.

Since not all the tasks are annotated by crowd workers and is incomplete, we use negative ‘dummy’ annotations to fill up . To eliminate the difference between meta annotations and crowd ones, and to model crowd workers, we introduce an accuracy value for each crowd worker, and transform a crowd worker’s annotation as follows:

(6)

We perform the above transformation on each bit of the annotation vector , each of the crowd worker is initialized when is attained.

The D&S model focuses on single-label tasks (with fixed choices) and models the bias of each meta-worker as a confusion matrix with size . in models the probability that worker wrongly assigns label to an instance of true label . We use the confusion matrix of a meta-worker to correct the annotation results using Eq. (7), where is the corrected annotation. We initialize each

with an identity matrix of size

in the first iteration.

(7)

3.6.2 Inferring Labels

Once the above correction process is performed via Eqs. (6) and (7), we compute the consensus labels using weighted majority voting on the corrected annotations. We then use the inferred labels to update the confusion matrix of the meta-workers and the accuracy values of the crowd workers. Let . We use the EM algorithm [4, 10, 33] to optimize and consensus labels until convergence. The detailed process is as follows.

E-step: We use Eqs. (6) and (7), and to correct and . Then we combine and

at the task level to obtain a larger annotation tensor

of size , where the first dimension is the number of tasks, the second is the number of workers, and each is an annotation of size . We sum all the annotations, task by task, and select the label corresponding to the position with the highest probability value as the ground truth, as described in Eq. (8) (the annotations and are vectors of size ).

(8)

M-step: Here we use and , and to update . A formal description is given in Eq. (10). Eq. (9) gives out how to count the number of tasks correctly answered by worker , is an indicator function such that if is true under condition , and otherwise. is worker ’s corresponding annotation matrix.

(9)

To update for crowd worker , we first count the number of tasks that the worker has correctly annotated by Eq. (9), and then normalize the count by the total number of annotations the worker has provided. Note that, in general, a crowd worker does not annotate all tasks (the negative dummy annotations are skipped).

We update the confusion matrix of meta-worker row by row. Here represents the probability of mistaking a task of label as ; as such, the denominator () is the number of tasks with label , and the numerator is the number of tasks whose label is , but is mistaken as . We need to count a total of label confusion cases, and update each entry in the confusion matrix.

(10)

Our approach (MetaCrowd) is summarized in Algorithm 1. We first use clustering and crowdsourcing to transform crowdsourcing project into -way -shot few-shot learning problem (line 1-4). Then meta learning algorithms are invited to train meta-workers (line 5-10), which will annotate all the remaining crowdsourcing tasks. After that, JS divergence is employed to measure task’s difficulties and the difficult tasks will be annotated by crowd workers again (line 11-16). In the end, we gather all the annotations, correct and aggregate them to compute the consensus labels (line 17-24).

Input: Project ( tasks of classes), related datasets , meta learning algorithm (-way -shot), meta-workers (size ), divergence threshold , crowd workers
Output: crowdsourcing consensus labels
1 Cluster into clusters;
2 Query from to obtain ;
3 Use and annotations to model ;
4 , ;
5 Meta-workers ;
6 for  in  do
7       Use on to extract meta-knowledge ;
8       Use , , and to generate meta-worker ;
9       ;
10      
11 end for
12for  in  do
13       Use to obtain annotations ;
14       if  then
15             Assign extra crowd workers to get ;
16            
17       end if
18      
19 end for
20Complete with ;
21 while not converged do
22       Correct crowd annotations using Eq. (6);
23       Correct meta annotations using Eq. (7);
24       Combine to estimate consensus labels using Eq. (8);
25       Update crowd workers’ model and meta-workers’ model using Eq. (10);
26      
27 end while
28Return consensus labels ;
Algorithm 1 MetaCrowd: Crowdsourcing with Meta-workers

4 Experiments

4.1 Experimental Setup

Datasets: We verified the effectiveness of our proposed method MetaCrowd on three real image datasets, Mini-Imagenet [31], 256_Object_Categories [7], and CUB_200_2011 [32]. Each dataset has multiple subclasses and we treat each subclass in a dataset as a dependent task (or a small dataset). Following the dataset division principle recommended by Mini-Imagenet, we divide it into three parts: train, val and test, the category ratio is , the other two datasets are also processed in a similar way. The statistics of these benchmark datasets are given in Table II. We deem all the data in the ‘train’ portion as a meta-training set and the data in the ‘val’ portion as the validation set, and we randomly select categories from the ‘test’ set to form an -way meta-test set.

Dataset Image Num
Mini-Imagenet 20 class 64+16 class 600 * 100
256_Object 40 class 128+32 class 90 * 200
CUB_200_2011 40 class 128+32 class 60 * 200
TABLE II: Statistics of the datasets.

Crowd Workers: Following the canonical worker setting [12], we simulate four types of workers (spammer, random, normal, and expert), with different capacity (accuracy) ranges and proportions as shown in Table III. We set up three different proportions of workers to study the influence of low reliability workers, normal reliability workers and high reliability workers, the weighted average capacity is 0.535, 0.600 and 0.650. We generate 30 crowd workers for Mini-Imagenet, and 10 crowd workers for 256_Object_Categories and CUB_200_2011 following the setup in Table III, to initialize our -way -shot dataset and to provide additional annotations for the tasks when meta-workers disagree.

Worker type Floor Ceiling Proportions
spammer 0.10 0.25 10% 10% 10%
random 0.25 0.50 20% 10% 10%
normal 0.50 0.80 60% 70% 50%
expert 0.80 1.00 10% 10% 30%
TABLE III: Crowd worker setup of proportion and capacity ranges. We simulated three groups of crowds with different worker type proportions whose comprehensive abilities ascended in turn. The second one is the typical setup we recommend.

We compare our MetaCrowd against five related and representative methods.
(i) Reqall (Request and allocate) [16] is a typical solution for budget saving. Reqall dynamically determines the amount of annotation required for a given task. For each task, it stops further annotating if the weighted ratio between two classes vote counts reaching a preset threshold or the maximum number. Reqall assumes the workers’ ability are known and mainly focus on binary task. We follow the advice in the paper to convert multi-class problems into binary ones. We fix its quality requirement as 3, consistent with MetaCrowd.
(ii) QASCA [48] is a classical task assignment solution, it estimates the quality improvement if a worker is assigned with a set of tasks (from a pool of tasks), and then selects the optimal set which results in the highest expected quality improvement. We set the budget to an average of 3 annotations per task (similar to Reqall).
(iii) Active (Active crowdsourcing) [5] takes into account budget saving and worker/task selection for crowdsourcing. Active combines task domain (meta-test) and source domain (meta-training) data using sparse coding to get a more concise high-level representation of task, and then uses a probabilistic graphical model to model both workers and tasks, then uses active learning to select the right worker for right task.
(iv) AVNC (Adaptive voting noise correction) [42] tries to identify and eliminate potential noisy annotations, then uses the remaining clean instances to train a set of classifiers to predict and correct the noisy annotations before truth inference. We use MV and WMV as its consensus algorithms and set the budget to an average of 3 annotations per task (similar to Reqall and QASCA)
(v) ST (Self-training method) [37] first trains the annotator with labeled instances in the pool . The annotator then finishes the tasks in , picks out the instance with the highest confidence label and merges it into . The above steps are repeated until all tasks are labeled.
(vi) MetaCrowd and its variants adopt three meta-workers trained by three types of meta algorithms (MAML, MN, and RN) under the -way -shot setting. In our experiments, we consider two variants for the ablation study. MetaCrowd-OC follows the canonical crowdsourcing principle and uses only crowd workers. All the tasks are annotated by the three crowd workers, and we deem its accuracy and budget as the baseline performance. MetaCrowd-OM uses only meta-workers to annotate , even when they disagree. The remaining settings of the variants are the same as MetaCrowd.

For the other parameters not mentioned in the above comparison methods, we have adopted the recommended parameter settings in their original paper.

Data Reqall QASCA Active AVNC ST MetaCrowd-OC MetaCrowd-OM MetaCrowd
Mini-Imagenet 0.778 0.781 0.733 0.768 0.808 0.307 0.624 0.763 0.698 0.764 0.748 0.825
256_Object 0.785 0.796 0.740 0.773 0.812 0.315 0.634 0.771 0.679 0.757 0.752 0.828
CUB_200_2011 0.779 0.792 0.731 0.756 0.800 0.293 0.622 0.760 0.703 0.773 0.732 0.817
(a) Accuracy of compared methods with the crowd workers’ weighted average capacity as 0.535 (more low-quality workers).
Data Reqall QASCA Active AVNC ST MetaCrowd-OC MetaCrowd-OM MetaCrowd
Mini-Imagenet 0.821 0.824 0.775 0.792 0.833 0.307 0.672 0.806 0.698 0.764 0.767 0.840
256_Object 0.828 0.827 0.777 0.807 0.838 0.315 0.664 0.792 0.679 0.757 0.778 0.855
CUB_200_2011 0.812 0.819 0.766 0.784 0.821 0.293 0.681 0.811 0.703 0.773 0.749 0.836
(b) Accuracy of compared methods with the crowd workers’ weighted average capacity as 0.600.
Data Reqall QASCA Active AVNC ST MetaCrowd-OC MetaCrowd-OM MetaCrowd
Mini-Imagenet 0.869 0.862 0.813 0.842 0.888 0.307 0.749 0.860 0.698 0.764 0.834 0.903
256_Object 0.858 0.880 0.817 0.857 0.891 0.315 0.757 0.858 0.679 0.757 0.842 0.911
CUB_200_2011 0.852 0.873 0.821 0.840 0.877 0.293 0.740 0.853 0.703 0.773 0.828 0.907
(c) Accuracy of compared methods with the crowd workers’ weighted average capacity as 0.650 (more high-quality workers).
TABLE IV: Accuracy of compared methods under three different crowd worker settings. The 1st/2nd columns of AVNC and MetaCrowd use weighted/unweighted majority vote as the consensus algorithm. Note that ST and MetaCrowd-OM have no crowd worker involved, so their accuracy does not change.

4.2 Analysis of the Results

Table IV gives the accuracy of the methods under comparison, grouped into four categories: dynamic task allocation (Reqall and QASCA), active learning (Active), machine self correction (AVNC and ST), and meta learning (MetaCrowd-OC, MetaCrowd-OM, and MetaCrowd). Particularly, AVNC and MetaCrowd adopt majority vote and weighted majority vote to compute consensus labels, while other methods adopts their own consensus solutions. We have several important observations.
(i) MetaCrowd vs. Self-training: ST uses supervised self-training to gradually annotate the tasks, it has the lowest accuracy among all the compared methods. This is because the lack of meta-knowledge (labeled training data) makes traditional supervised methods unfeasible under the setting of few-shot learning. When the number of labeled instances is small, ST cannot train an effective model, so the quality of pseudo-labels is not high, and the influence of the error will continue to expand, eventually leading to the failure of the self-training classifier. The other crowdsourcing solutions obtain a much higher accuracy by modelling tasks and workers, and MetaCrowd achieves the best performance through both meta learning and ensemble learning. This shows that the extraction of meta-knowledge from relevant domains is crucial for the few-shot learning process.
(ii) MetaCrowd vs. Active: Both MetaCrowd and Active try to reduce the number of annotations to save the budget. By modeling workers and tasks, Active assigns only the single most appropriate worker to the task to save the budget and to improve the quality. In contrast, MetaCrowd leverages meta-knowledge and initial labels from crowd workers to automatically annotate a large portion of simple tasks, and invites crowd workers to annotate a small number of difficult tasks. Thus, both quality and budget saving can be achieved. MetaCrowd-OM and MetaCrowd both achieve a significantly higher accuracy than Active. This is because crowd workers have diverse preferences and the adopted active learning strategy of Active cannot reliably model workers due to the limited data. By obtaining additional annotations for the most uncertain tasks and meta annotations for all the tasks, MetaCrowd achieves an accuracy improvement of 8%. Even with a large number of annotations from plain crowd workers, MetaCrowd-OC still loses to MetaCrowd, which proves the rationality of empowering crowdsourcing with meta learning.
(iii) MetaCrowd vs. Reqall: Both MetaCrowd and Reqall dynamically determine the number of annotations required for a task based on the annotating results; thus they can trade-off budget with quality. Reqall assumes that workers’ abilities are known and mainly focuses on binary tasks (we follow the recommended approach to adjust it to the multi-class case). With no more than three annotations per task on average (MetaCrowd-OC baseline setting), MetaCrowd beats Reqall both in quality and budget, because MetaCrowd can leverage the classifiers to do most of the simple tasks, and save the budget to focus on the difficult tasks. These results show the effectiveness of our human-machine hybrid approach.
(iv) MetaCrowd vs. QASCA: QASCA and Reqall both decide the next task assignment based on the current annotation results. QASCA does not assume that the workers’ abilities are known, but derives them based on the EM algorithm, so its actual performance is much better than Reqall. However, QASCA is still beaten by MetaCrowd. This is because QASCA is plagued by the cold start problem. At the beginning, it can only treat all workers as saints, and is more affected by low-quality workers. In contrast, MetaCrowd has a relatively accurate understanding of workers at the beginning, owing to the meta-test set construction process and the repel of low-quality workers for difficult tasks.
(v) Modeling vs. non-modeling of workers: In crowdsourcing, we often need to model tasks and/or workers to better perform the tasks and compute the consensus labels of the tasks. Comparing the second column (weighted majority vote) of AVNC, MetaCrowd-OC, MetaCrowd-OM and MetaCrowd with their respective non-model version (first column, majority vote), we can draw the conclusion that, by modeling workers, we can better compute the consensus labels of tasks from their annotations. This advantage is due to two factors: we introduce a worker model to separately account for crowd workers and meta-workers; and we model the difficulty of tasks using divergence, and pay more attention to the difficult ones.
(vi) Robustness to different situations: We treat the results in Table V(b) as the baseline, in Table V(a) there are more noisy workers and while in Table V(c) there are more experts. By comparing the results in Table V(a) and Table V(b), we can see that although the average quality of crowd workers drops by , the accuracy of MetaCrowd reduces by less than , while other methods have a larger reduce . This can be explained by two reasons. First, MetaCrowd utilizes sufficient data to model workers during building the Meta-test set stage, so potential low-quality crowd workers can be identified; second, most of the simple task annotations are provided by ordinary but reliable meta-workers, and we also consult crowd workers with fair quality for difficult tasks. Therefore, MetaCrowd can reduce the impact of low-quality workers and obtain more robust results. In even less common situations with many experts (Table V(c)), MetaCrowd also achieves the best performance. This confirms that our MetaCrowd is suitable for a variety of worker compositions, especially when the capacity of workers is not so good.

4.3 Budget Saving

We use quantitative analysis and simulation results to illustrate the advantage of MetaCrowd for budget saving in terms of the number of used annotations. For this quantitative analysis, we adopt the typical assumption that the expense of a single annotation is uniformly fixed. We separately estimate the budget of MetaCrowd-OC, Reqall, QASCA, Active and MetaCrowd. AVNC adopts the same workflow as MetaCrowd-OC, except for noise correction operation, so its budget is the same as MetaCrowd-OC. On the other hand, ST and MetaCrowd-OM basically has no crowd workers involved, so they are not considered here.

4.3.1 Quantitative Analysis

The number of annotations used by some methods can be estimated in advance, we first calculate them theoretically.
MetaCrowd-OC asks crowd workers to annotate each task and consumes a total of annotations.
MetaCrowd needs some initial annotations to kick off the training of meta-workers. The total number of annotations needed in MetaCrowd is , where is the size of , is the number of crowd workers for repeated annotations, is the margin amplification factor to ensure that can be formed, and is the ratio (determined by the divergence threshold ) of instances that need additional annotations (the other symbols can be found in Table I). Usually , if is big enough, then the number of annotations can be simplified as .
Active needs about annotated tasks to build a stable model of workers and tasks before applying active learning; each of the remaining tasks only needs the single most suitable crowd worker. We let crowd workers annotate tasks, so the total number of annotations for Active is .
QASCA has set the total budget to an average of 3 annotations per task, so it also consumes a total of annotations.
Reqall depends on the quality requirement, task difficulty, and workers’ ability to determine the number of consumed annotations, so the required number of annotations can not be explicitly quantified. In the multi-class case, Reqall considers the two classes with the most votes and thus wastes the budget to some extent.

In summary, the total number of annotations needed in MetaCrowd-OC, Active, MetaCrowd and QASCA are about , , and , respectively.

4.3.2 Simulation Results

Following the experimental settings in the previous subsection, we count the number of consumed crowd worker annotations on Mini-Imagenet (, ) for example.
Both MetaCrowd-OC and QASCA require annotations;
MetaCrowd () costs annotations.
Active asks for about annotations to build the model, and the other tasks consumes about annotations, for a total number of about annotations.
Reqall, with the requirement that the budget should be no more than three annotations per task on average, achieves a competitive quality and consumes about 2.8 annotations per task, amount to a total of annotations;

Data Task Reqall QASCA Active MetaCrowd-OC MetaCrowd
Budget Quality Budget Quality Budget Quality Budget Quality Budget Quality
Mini-Imagenet 3000 8424 0.821 9000 0.824 5000 0.775 9000 0.806 3971 0.840
256_Object 450 1256 0.828 1350 0.827 750 0.777 1350 0.792 654 0.855
CUB_200_2011 300 837 0.812 900 0.819 500 0.766 900 0.811 530 0.836
TABLE V: Number of annotations used and accuracy of MetaCrowd-OC, Reqall, Active and MetaCrowd. Number in column ‘Budget’ means the number of used annotations and number in column ‘Quality’ means the consensus label accuracy. We use the typical crowd worker proportion in Table III.

The total number of annotations and accuracy of the methods on three datasets are listed in Table V. MetaCrowd always outperforms Active, Reqall, and MetaCrowd-OC in terms of budget and quality, and its budget saving advantage becomes more prominent as the increase of crowdsourcing project size . MetaCrowd loses to Active in terms of budget on CUB_200_2011, due to the small size of this dataset. In fact, MetaCrowd is not suitable for crowdsourcing tasks with a relatively small size, or with an extremely large label space, in which repeated annotations of for meta-learning consume a large portion of the budget.

4.4 Parameter Analysis

4.4.1 Parameters in Meta Learning

We study the impact of some preset parameters of MetaCrowd, namely and for meta learning algorithm. Generally, the values of and are determined by the given problem. Here we simulate the influence of and on the meta learning algorithm using Mini-Imagenet with MAML algorithm, other datasets and algorithm combinations lead to similar conclusions. We vary or from 1 to 10 while keeping the other fixed as in the previous experiments.

We can see from Fig. 3 that the accuracy increases as the number of tasks increases, but the increment gradually slows down, which is consistent with our intuition that more annotated tasks facilitate the training of more credible meta-workers, and hence improve the quality. On the other hand, as the number of classes increases, the accuracy gradually decreases, but it is always much higher than that of random guess, which suggests the effectiveness of the meta learning algorithm for a few-shot learning task. In crowdsourcing, is determined by the crowdsourcing project itself, and the only parameter we can choose is . A larger gives a better meta-worker, but it is expensive to form a meta-training set with a large ; as such, is adopted as a trade-off in this paper.

Fig. 3: Accuracy vs. the number of class labels and the number of training instances per label. Note that as increase, the accuracy of random guessing closes to .

The heat-maps of the confusion matrices of our meta-workers under a 5-way 5-shot setting are shown in Fig. 4. We find that the accuracy of all three meta-workers is about 0.6 (the values on the diagonal of each confusion matrix), which is in accordance with the average capacity of our normal crowd workers in Table III. In addition, the meta-workers also manifest different preferences and are capable of doing different tasks.

(a) MAML
(b) MN
(c) RN
Fig. 4: Heat-maps of confusion matrices of three meta-workers (MAML, MN and RN). It can be seen that the accuracy of the meta-workers is around 0.6, and these meta-workers have different induction preferences, which are beneficial for ensemble learning.

4.4.2 Parameters in Crowdsourcing

Here we study the impact of the divergence threshold and the number of additional annotations for difficult task on the crowdsourcing quality and budget trading off. Generally speaking, a larger and will lead to a better crowdsourcing quality while consuming more budget as well, so how to set appropriate values for these two parameters to meet the quality and budget requirements as much as possible is critical.

The divergence quantification metric (see Eq. (5)) is within , so we change from 0 to 1 with an interval of 0.05. For each value, we assign workers to the task whose divergence degree exceeds , and finally count the number of instances that received further crowd annotation. Fig. 5 gives the results under different input values of . We can see that as decreases from 1 to 0, the number of instances that need additional annotations gradually increases, and the aggregation accuracy also increases. The overall trend can be roughly divided into three stages according to the value of : . In the first stage , although there is a large variation range of , the number of tasks needs to be manually annotated and the quality of the crowdsourcing project are almost unchanged, this is because the divergence of tasks is roughly distributed within the range of , so the influence of in the first stage is very limited. However in the second stage, the budget and quality of crowdsourcing project are very sensitive to . With the decrease of , both budget and quality increase significantly. In our experiment, setting the value of within [0.3, 0.6] is a reasonable choice. As to the third stage, the budget of the crowdsourcing project is still increasing rapidly with the change of , but the quality keeps a relatively stable stage with little improvement. This can be explained as that the tasks with a small degree of divergence are usually relatively simple tasks, and their annotation results are agreed between meta-workers, and the further manual annotations will not significantly improve the quality. Based on these results, we adopt for experiments, and for the trade off between quality and budget.

(a) Mini-Imagenet
(b) 256_Object
(c) CUB_200_2011
Fig. 5: Accuracy and the number of (kilo) difficult tasks with the change of divergence threshold on three datasets.

We also studied the impact of the number of additional annotations received for each difficult task on the quality and budget of crowdsourcing. The experimental results in Fig. 6 are in line with our intuition: when increases, the quality of crowdsourcing will gradually increase and becomes relatively stable afterwards. At the same time, the accompany budget linearly increases. If , MetaCrowd will degenerate into MetaCrowd-OM. Given that, we fix , which is the same as the number of meta-workers in our experiment.

(a) Mini-Imagenet
(b) 256_Object
(c) CUB_200_2011
Fig. 6: Accuracy and the number of (kilo) extra annotations with the change of annotations received for each difficult task on three datasets.

5 Conclusion

In this paper, we study how to leverage meta learning with crowdsourcing for budget saving and quality improving, and propose an approach called MetaCrowd (Crowdsourcing with Meta-Workers) that implements this idea. Our MetaCrowd approach uses meta learning to train capable meta-workers for crowdsourcing tasks and thus to save budgets. Meanwhile, it quantifies the divergence between meta-workers’ annotations to model the difficulty of tasks, and collects additional annotations for difficult tasks from crowd workers to improve the quality. Experiments on benchmark datasets show that MetaCrowd is superior to the representative methods in terms of budget saving and crowdsourcing quality.

Our method has a better tolerance for the quality of crowdsourcing workers, but it has certain requirements for the types and characteristics of crowdsourcing tasks. One of the possible improvement of our work lies in how to obtain the initial labeled data set required for meta learning with less budget.

References

  • [1] R. Boim, O. Greenshpan, T. Milo, S. Novgorodov, N. Polyzotis, and W. Tan (2012) Asking the right questions in crowd data sourcing. In 28th IEEE International Conference on Data Engineering, pp. 1261–1264. Cited by: §1.
  • [2] J. Bromley, J. W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore, E. Säckinger, and R. Shah (1993) Signature verification using a “siamese” time delay neural network.

    International Journal of Pattern Recognition and Artificial Intelligence

    7 (04), pp. 669–688.
    Cited by: §2.
  • [3] A. I. Chittilappilly, L. Chen, and S. Amer-Yahia (2016) A survey of general-purpose crowdsourcing techniques. IEEE Transactions on Knowledge and Data Engineering 28 (9), pp. 2246–2266. Cited by: §1.
  • [4] A. P. Dawid and A. M. Skene (1979) Maximum likelihood estimation of observer error-rates using the em algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics) 28 (1), pp. 20–28. Cited by: §3.6.1, §3.6.2.
  • [5] M. Fang, J. Yin, and D. Tao (2014) Active learning for crowdsourcing using knowledge transfer. In AAAI Conference on Artificial Intelligence, pp. 1809–1815. Cited by: §2, §4.1.
  • [6] C. Finn, P. Abbeel, and S. Levine (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pp. 1126–1135. Cited by: §2, §3.4.
  • [7] G. Griffin, A. Holub, and P. Perona (2007) Caltech-256 object category dataset. Cited by: §4.1.
  • [8] S. Guo, A. Parameswaran, and H. Garcia-Molina (2012) So who won? dynamic max discovery with the crowd. In ACM SIGMOD International Conference on Management of Data, pp. 385–396. Cited by: §3.6.1.
  • [9] J. Howe (2006) The rise of crowdsourcing. Wired Magazine 14 (6), pp. 1–4. Cited by: §1.
  • [10] P. G. Ipeirotis, F. Provost, and J. Wang (2010) Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD workshop on human computation, pp. 64–67. Cited by: §3.6.1, §3.6.2.
  • [11] D. R. Karger, S. Oh, and D. Shah (2011) Iterative learning for reliable crowdsourcing systems. In Advances in Neural Information Processing Systems, pp. 1953–1961. Cited by: §3.6.1.
  • [12] G. Kazai, J. Kamps, and N. Milic-Frayling (2011) Worker types and personality traits in crowdsourcing relevance labels. In ACM International Conference on Information and Knowledge Management, pp. 1941–1944. Cited by: §1, §4.1.
  • [13] Ł. Korycki and B. Krawczyk (2017) Combining active learning and self-labeling for data stream mining. In International Conference on Computer Recognition Systems, pp. 481–490. Cited by: §2.
  • [14] G. Li, J. Wang, Y. Zheng, and M. J. Franklin (2016) Crowdsourced data management: a survey. IEEE Transactions on Knowledge and Data Engineering 28 (9), pp. 2296–2319. Cited by: §1.
  • [15] G. Li, Y. Zheng, J. Fan, J. Wang, and R. Cheng (2017) Crowdsourced data management: overview and challenges. In ACM International Conference on Management of Data, pp. 1711–1716. Cited by: §2, §3.5.
  • [16] Q. Li, F. Ma, J. Gao, L. Su, and C. J. Quinn (2016) Crowdsourcing high quality labels with a tight budget. In ACM International Conference on Web Search and Data Mining, pp. 237–246. Cited by: §1, §4.1.
  • [17] L. Liu, T. Zhou, G. Long, J. Jiang, and C. Zhang (2020) Many-class few-shot learning on multi-granularity class hierarchy. IEEE Transactions on Knowledge and Data Engineering 99 (1), pp. 1–14. Cited by: §2.
  • [18] A. Marcus, D. Karger, S. Madden, R. Miller, and S. Oh (2012) Counting with the crowd. VLDB Endowment 6 (2), pp. 109–120. Cited by: §2.
  • [19] B. Mozafari, P. Sarkar, M. Franklin, M. Jordan, and S. Madden (2014) Scaling up crowd-sourcing to very large datasets: a case for active learning. VLDB Endowment 8 (2), pp. 125–136. Cited by: §2.
  • [20] T. Munkhdalai and H. Yu (2017) Meta networks. In International Conference on Machine Learning, pp. 2554–2563. Cited by: §2, §3.4.
  • [21] A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap (2016) Meta-learning with memory-augmented neural networks. In International Conference on Machine Learning, pp. 1842–1850. Cited by: §2.
  • [22] V. S. Sheng, F. Provost, and P. G. Ipeirotis (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622. Cited by: §1.
  • [23] V. S. Sheng and J. Zhang (2019) Machine learning with crowdsourcing: a brief summary of the past research and future directions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 9837–9843. Cited by: §1.
  • [24] J. Snell, K. Swersky, and R. Zemel (2017) Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems, pp. 4077–4087. Cited by: §2.
  • [25] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales (2018) Learning to compare: relation network for few-shot learning. In

    IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 1199–1208. Cited by: §1, §2, §3.4.
  • [26] Y. Tong, L. Chen, Z. Zhou, H. V. Jagadish, L. Shou, and W. Lv (2018) SLADE: a smart large-scale task decomposer in crowdsourcing. IEEE Transactions on Knowledge and Data Engineering 30 (8), pp. 1588–1601. Cited by: §2.
  • [27] Y. Tong, L. Wang, Z. Zhou, L. Chen, B. Du, and J. Ye (2018) Dynamic pricing in spatial crowdsourcing: a matching-based approach. In ACM SIGMOD International Conference on Management of Data, pp. 773–788. Cited by: §2.
  • [28] J. Tu, G. Yu, J. Wang, C. Domeniconi, and X. Zhang (2020) Attention-aware answers of the crowd. In SIAM International Conference on Data Mining, pp. 451–459. Cited by: §1.
  • [29] J. Tu, G. Yu, J. Wang, C. Domeniconi, M. Guo, and X. Zhang (2020) CrowdWT: crowdsourcing via joint modeling of workers and tasks. ACM Transactions on Knowledge Discovery from Data 99 (1), pp. 1–24. Cited by: §2, §3.3, §3.5.
  • [30] J. Vanschoren (2018) Meta-learning: a survey. arXiv preprint arXiv:1810.03548. Cited by: §1, §2, §3.4.
  • [31] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. (2016) Matching networks for one shot learning. In Advances in Neural Information Processing Systems, pp. 3630–3638. Cited by: §4.1.
  • [32] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie (2011) The caltech-ucsd birds-200-2011 dataset. Cited by: §4.1.
  • [33] J. Wang, T. Kraska, M. J. Franklin, and J. Feng (2012) CrowdER: crowdsourcing entity resolution. VLDB Endowment 5 (11), pp. 1483–1494. Cited by: §2, §3.6.2.
  • [34] J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng (2013) Leveraging transitive relations for crowdsourced joins. In ACM SIGMOD International Conference on Management of Data, pp. 229–240. Cited by: §2.
  • [35] P. Welinder, S. Branson, P. Perona, and S. Belongie (2010) The multidimensional wisdom of crowds. Advances in Neural Information Processing Systems 23, pp. 2424–2432. Cited by: §1.
  • [36] J. Whitehill, T. Wu, J. Bergsma, J. Movellan, and P. Ruvolo (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems, pp. 2035–2043. Cited by: §3.6.1.
  • [37] Q. Xie, M. Luong, E. Hovy, and Q. V. Le (2020) Self-training with noisy student improves imagenet classification. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 10687–10698. Cited by: §2, §4.1.
  • [38] D. Yarowsky (1995) Unsupervised word sense disambiguation rivaling supervised methods. In Annual Meeting of the Association for Computational Linguistics, pp. 189–196. Cited by: §2.
  • [39] D. Yoo and I. S. Kweon (2019) Learning loss for active learning. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 93–102. Cited by: §2.
  • [40] G. Yu, J. Tu, J. Wang, C. Domeniconi, and X. Zhang (2020) Active multilabel crowd consensus. IEEE Transactions on Neural Networks and Learning Systems 99 (1). Cited by: §2, §2, §3.5.
  • [41] C. J. Zhang, L. Chen, H. V. Jagadish, and C. C. Cao (2013) Reducing uncertainty of schema matching via crowdsourcing. VLDB Endowment 6 (9), pp. 757–768. Cited by: §3.6.1.
  • [42] J. Zhang, V. S. Sheng, T. Li, and X. Wu (2017) Improving crowdsourced label quality using noise correction. IEEE Transactions on Neural Networks and Learning Systems 29 (5), pp. 1675–1688. Cited by: §2, §4.1.
  • [43] J. Zhang, M. Wu, and V. S. Sheng (2018) Ensemble learning from crowds. IEEE Transactions on Knowledge and Data Engineering 31 (8), pp. 1506–1519. Cited by: §2.
  • [44] Y. Zhang and Q. Yang (2018) An overview of multi-task learning. Nature Science Review 5 (1), pp. 30–43. Cited by: §3.1.
  • [45] L. Zheng and L. Chen (2018) Dlta: a framework for dynamic crowdsourcing classification tasks. IEEE Transactions on Knowledge and Data Engineering 31 (5), pp. 867–879. Cited by: §2.
  • [46] Y. Zheng, G. Li, and R. Cheng (2016) Docs: a domain-aware crowdsourcing system using knowledge bases. VLDB Endowment 10 (4), pp. 361–372. Cited by: §1.
  • [47] Y. Zheng, G. Li, Y. Li, C. Shan, and R. Cheng (2017) Truth inference in crowdsourcing: is the problem solved?. Proceedings of the VLDB Endowment 10 (5), pp. 541–552. Cited by: §3.3.
  • [48] Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng (2015) QASCA: a quality-aware task assignment system for crowdsourcing applications. In ACM SIGMOD International Conference on Management of Data, pp. 1031–1046. Cited by: §1, §4.1.