. This is especially helpful in situations where crowd workers lack expertise for responding to complex questions directly. Each worker is given the entire set of questions in a batch mode and the workers provide their responses in the form of a vector. These binary questions can be posted as “microtasks” on crowdsourcing platforms like Amazon Mechanical Turk. To improve classification performance in crowdsourcing systems, most of the works in the literature focus on enhancing the quality of individual tests, by designing fusion rules to combine decisions from heterogeneous workers [6, 7, 1, 2, 3, 4], and by investigating the assignment of different tests to different workers depending upon their skill level [8, 9]. These problems have also been extended to budget-constrained environments to improve classification performance[10, 11, 12].
In this paper, we present a new paradigm for classification in crowdsourcing systems in which binary questions (micro-tasks) are asked in a sequential manner. This novel sequential paradigm in terms of a decision tree has not been considered in the literature. This paradigm provides the opportunity to order the sequence of tests for more efficient classification by reducing the number of questions asked on an average. Furthermore, we can obtain a trade-off in terms of cost (number of questions asked) and performance by performing task assignment and using only a subset of workers per node of the decision tree. Best performance with the decision tree paradigm can be achieved when all workers respond to every test in the decision tree. However, as shown in this paper, the performance with the proposed worker assignment, where each worker only responds to one test as opposed to all the tests in the tree, is comparably when the number of workers is large.
Related work: Information theoretic methods have been used to construct efficient decision trees [13, 14]. Classical algorithms utilize a top-down tree structure, such as ID3, C4.5, and CART [15, 16, 17]. They categorize the objects at each node(test) into tree branches until a leaf is reached, and objects at this leaf are considered to belong to the same class. At each node, these algorithms search for a thresholding-based test on a certain attribute, such that the test can categorize the objects. ID3 and C4.5 construct the decision tree by maximizing the information gain at each node, which is defined as reduction in entropy. In CART, Gini impurity is minimized during test selection at each node.
The first strong assumption in traditional algorithms is that all the tests are error-free in determining whether or not an attribute exceeds the threshold. In practical crowdsourcing systems, however, due to the noise in observing or measuring the attribute as well as human limitations, there exist errors and uncertainties when workers perform the tests. For objects belonging to different classes, the error probability corresponding to a specific test could also be different. Existing algorithms do not address the concern that error probabilities of the tests play an important role in the design of decision trees.
Another limitation of these algorithms is the assumption of completely known information of object attributes to compute information gain and Gini impurity, i.e., probability at node , , where is the correct class and is the result of the test. Even though some algorithms  can handle missing attributes information, they simply discard the missing attributes and use the remaining ones for decision tree construction. In the process of decision tree construction, they need to decide not only which attribute to use, but also the optimal threshold. The run time complexity goes up to , where is the number of objects and is number of attributes . However, in practical crowdsourcing applications, we might not have the complete information . What we have are a limited number of tests (binary questions), and the corresponding test results. These above limitations of existing literature motivate the research results presented in this paper.
Major contributions: Instead of assuming that each test in a decision tree is perfect, we consider the fact that there may be errors when tests are performed and develop an efficient algorithm to construct decision trees for the imperfect test scenario. The resulting tree is applicable to many practical problems including to classification performed by unreliable crowdsourcing workers. In our algorithm, the decision tree is constructed by utilizing a given set of tests, where each test gives a binary result or depending on which class the object belongs to. We do not assume a complete knowledge of . We provide performance guarantees in terms of the upper bound on probability of mis-classification (or the lower bound on probability of correct classification). The time complexity of our algorithm is polynomial of , which is the number of tests. Since is usually much smaller than and , our complexity is reduced significantly compared to other methods, e.g., the one proposed in . After the decision tree is constructed, we employ it for classification via crowdsourcing. To reduce cost in terms of the number of questions asked while maintaining low probability of mis-classification, we further develop an algorithm to efficiently assign workers to different tests, to obtain a trade-off between the probability of mis-classification and the cost of crowdsourcing.
Ii Decision Tree in Crowdsourcing
Ii-a System Model
Consider a classification problem to be solved via crowdsourcing. Suppose there is a set of objects
, and each object within the set needs to be classified to a class,
. The prior probability that an object inbelongs to is denoted as . An unknown object passes through a series of simple tests (nodes in the decision tree) until it reaches a leaf node and gets classified. We consider that each test provides a binary output for a subset of , thus partitioning the subset of input objects into two output subsets. If an object belonging to gets mis-categorized at test , a misclassification will happen in the end and this corresponding error probability is demoted by . Table I
gives an example of test statistics and Fig.1 gives two possible testing algorithms. As indicated by Table I, tests can bifurcate the entire set and can only bifurcate a subset of objects belonging to the classes . Assuming that all the tests have the same error probability , the final misclassification probabilities in Fig. 1(a) and Fig. 1(b) are and respectively. Thus, we can see that even though the same set of tests are employed, different decision tree structures (ordering of tests) have different probabilities of mis-classification. Our goal is to build a decision tree that minimizes the mis-classification probability.
Define the test level , as in Fig. 2, where is the depth of the tree structure. At each level , define the partitions of classes induced by tests applied so far to be , where is the cardinality of the partition set and it implies the degree of completion of the classification task. A larger indicates closer to the completion of classification. In the example given in Fig. 2, we have:
Note is where each class has been individually distinguished. Let denote the partition induced in . We define the entropy at level as:
where is the joint probability of partitions and . Following this definition, , and . The entropy at each level will be exploited in choosing the tests for the next level so as to minimize the final probability of mis-classification.
Iii Proposed Decision Tree Design Algorithms
In this section, we focus on algorithms for decision tree design. We use two types of approximations, namely, addition approximation to minimize the upper bound of mis-classification probabilities, and multiplication approximation to maximize the lower bound of correct classification probabilities, respectively. Previous work , with the objective of minimizing the upper bound of test cost, e.g., memory, execution time, does not consider error in tests (noisy tests), which is very different from our paper.
Iii-a Bounding the probability of mis-classification
In a decision tree, the probability of mis-classification error is given by :
where is the error probability associated with the unknown object belonging to as it traverses the node between levels and . Note that if an object does not pass through a level of the tree, the corresponding error probability is .
Typically, the error probability for each test is small. Otherwise, the corresponding test should be replaced by a better test to reduce the error probability. Since the error probability of each test is small, the probability of mis-classification can be approximated by dropping out the higher order terms as
Thus, we have the additive approximation
where represents the error probability induced by tests between level and level . Recalling the definition of in (1), and using the fact that , we can write:
where is the metric we use for decision tree construction. It is the reduction in entropy from to , divided by the error probability induced between these two levels. Essentially, it indicates the sensitivity to error for reducing uncertainty in decision tree design at a certain level. Define , and . Due to the fact that , and , it follows that . Substituting (4) into (III-A), we can have
which leads to
Since our goal is to minimize , we are interested in minimizing the upper bound . Since is fixed, we need to maximize . During the construction of testing algorithm, it is sufficient to maximize each of , . When we construct the decision tree from level to level , we select the tests that maximize the value , and the decision tree construction step ends when it reaches the -th level.
Iii-B Bounding the probability of correct classification
In this section, we focus on decision tree design to maximize the probability of correct classification, which can be written as
Since the effect of higher order terms is negligible as typically they are small, we approximate as
where represents the probability of correct classification between level and , .
Then, we provide the entropy in the multiplicative form as
where is the metric based on which we select tests. It is the generalized entropy ratio of levels and , divided by the probability of correct classification between these two levels. Essentially, it indicates the degree of reduction in uncertainty when the test correctly bifurcates the objects. Define , and . Since , substitute (7) into (8) and we have
which leads to
As we desire to maximize the probability of correct classification , and thus are interested in maximizing its lower bound which is . Since is fixed, we need to minimize . During the construction of the decision tree, it is sufficient to select the tests that minimize the value from level to level .
The additive approximation is obtained by discarding second to th order terms of , while the multiplicative approximation discards th order of . Thus, multiplicative approximation is more accurate than additive approximation. However, the tightness of the bounds on probability of correct classification in the multiplicative method depends on the metric . In this paper, we choose , which might not be optimal.
In our simulation with the experimental setting as shown in Table 1, and when we assume that all the tests have the same error probability , both methods give the same resulting decision tree (testing algorithm) shown in Fig. 1(b). Fig. 3 shows the efficiency of the proposed decision tree design algorithm by comparing its probability of mis-classification (blue curve) with the case where tests are in a random order (red curve). As we can see from the figure, the performance is significantly improved with our methods, and the improvement becomes more prominent as increases.
Iv Worker Assignment
After designing the decision tree, the next step is to assign the available crowd workers to the nodes of the decision tree. The naive and the most costly approach will be to have all available workers answer questions corresponding to each node. This will mean that the number of questions answered will be , where is the number of nodes in the decision tree and is the total number of workers. The goal in this section is to investigate the trade-off between the saving in the number of questions answered (cost) and the degradation in performance as well as to develop an efficient algorithm to assign subsets of workers to different nodes of the tree. In particular, each node must have at least one worker assigned to it; the goal is to find an algorithm to optimally distribute remaining crowd workers among the nodes of the decision tree. When subgroups of workers are assigned to perform different tests at individual nodes, the workers’ local decisions are collected by a fusion center (FC). Majority voting is used in this paper for decision fusion for crowdsourcing. In a subgroup of workers with size , , each worker completes the same test that will produce binary results or . The probability of error of the th worker for the corresponding test is . In majority rule, FC will follow the decisions made by the majority. That is, if at least workers declare to be the result, FC will decide ; otherwise, it will decide that is true. For a certain test, we provide the worker assignment scheme.
Suppose the expected probability of error of each worker for a certain test is . When , the probability of error at FC is a decreasing function of . The reduction in probability of error at FC decreases as well, as increases, i.e. for
See Appendix A. ∎
Under the assumption of Proposition 1: , after we have constructed a testing algorithm, for example the one shown in Fig. 1(b), each test is assigned a randomly chosen worker. After that, we assume that we have a group of additional workers available to reduce the error probabilities of one or more tests. Let be the number of workers assigned to test . By doing so, we ensure that the number of workers performing test
is odd, and. We address the problem of how to assign these workers to different tests, i.e., to determine the values of , such that we can achieve minimum probability of mis-classification .
From the result of Proposition 1, as more workers are assigned to the same test, the rate of reduction in error probability decreases. Thus, we are encouraged to allocate two workers at a time to a certain test, to guarantee the odd number of workers for each test, and to ensure the maximal rate of reduction in error probability each time. Using the methods proposed in the previous section, we can construct the decision tree and find the level that has the minimal or maximal (both decision tree construction algorithms provide the same result). For the tests between level and , we add two workers to the test that gives most increase in or most decrease in . We provide the following worker assignment algorithm:
In our simulations, each worker has an error probability of for all the tests, and Fig. 4 shows the probability of mis-classification when the number of workers increases. The blue curve represents the case when we assign all the workers to a single test (randomly chosen); the red curve indicates the scenario where each worker is randomly assigned to a test, and the yellow curves represents the proposed worker assignment rule associated with metric . We can see from the figure that one should not assign workers in a highly unbalanced fashion as is indicated by the blue curve. Random worker assignment achieves better performance, which is outperformed by our proposed method. The purple curve represents the scenario where each worker participates in all the tests in our decision tree. Though it has the best performance, the cost (number of tests answered by workers) induced is times higher, where is the number of tests. After , we can see that our algorithm achieves comparable performance as the purple curve, however, with a significantly lower cost.
This work presented a novel sequential paradigm for crowdsourced classification and also addressed the test ordering problem. With limited knowledge of worker’s reliability in performing imperfect tests, we provided a greedy decision tree design to minimize the probability of mis-classification. Two different methods were used to approximate the probabilities of mis-classification and correct classification. We also investigated the worker assignment problem, by studying the assignment of a limited number of workers to different tests. Numerical results showed the superiority of our testing algorithm, as well as the efficiency of the worker assignment strategy. While our greedy level-by-level decision tree construction only achieves local optimality, in future work, we will explore the possibility of obtaining globally optimal solutions.
-  Q. Li, A. Vempaty, L. R. Varshney, and P. K. Varshney, “Multi-object classification via crowdsourcing with a reject option,” IEEE Trans. Signal Process., vol. 65, no. 4, pp. 1068–1081, Feb 2017.
-  Q. Li and P. K. Varshney, “Optimal crowdsourced classification with a reject option in the presence of spammers,” in IEEE ICASSP, 2018.
-  A. Vempaty, L. R. Varshney, and P. K. Varshney, “Reliable crowdsourcing for multi-class labeling using coding theory,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 4, pp. 667–679, Aug 2014.
-  Q. Li and P. K. Varshney, “Does confidence reporting from the crowd benefit crowdsourcing performance?” in SocialSens’, 2017, pp. 49–54.
-  M. Buhrmester, T. Kwang, and S. D. Gosling, “Amazon’s mechanical turk: A new source of inexpensive, yet high-quality, data?” Perspectives on psychological science, vol. 6, no. 1, pp. 3–5, 2011.
-  O. Dekel and O. Shamir, “Vox populi: Collecting high-quality labels from a crowd,” 2009.
-  P. G. Ipeirotis, F. Provost, and J. Wang, “Quality management on amazon mechanical turk,” in ACM SIGKDD Workshop, 2010, pp. 64–67.
-  C.-J. Ho, S. Jabbari, and J. W. Vaughan, “Adaptive task assignment for crowdsourced classification,” in ICML, 2013, pp. 534–542.
-  S. B. Roy, I. Lykourentzou, S. Thirumuruganathan, S. Amer-Yahia, and G. Das, “Task assignment optimization in knowledge-intensive crowdsourcing,” The VLDB Journal, vol. 24, no. 4, pp. 467–491, 2015.
-  X. Liu, M. Lu, B. C. Ooi, Y. Shen, S. Wu, and M. Zhang, “Cdas: a crowdsourcing data analytics system,” VLDB Endowment, vol. 5, no. 10, pp. 1040–1051, 2012.
-  C.-J. Ho and J. W. Vaughan, “Online task assignment in crowdsourcing markets.” in AAAI, vol. 12, 2012, pp. 45–51.
-  D. R. Karger, S. Oh, and D. Shah, “Iterative learning for reliable crowdsourcing systems,” in NIPS, 2011, pp. 1953–1961.
T. M. Mitchell et al.
, “Machine learning. 1997,”Burr Ridge, IL: McGraw Hill, vol. 45, no. 37, pp. 870–877, 1997.
-  C. Hartmann, P. Varshney, K. Mehrotra, and C. Gerberich, “Application of information theory to the construction of efficient decision trees,” IEEE Trans. Inf. Theory, vol. 28, no. 4, pp. 565–577, 1982.
-  J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.
-  ——, C4. 5: programs for machine learning. Elsevier, 2014.
-  L. Breiman, Classification and regression trees. Routledge, 2017.
-  T.-S. Lim, W.-Y. Loh, and Y.-S. Shih, “A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms,” Machine Learning, vol. 40, no. 3, pp. 203–228, 2000.
Appendix A Proof of Proposition 1
from a binomial distribution with expected success probabilitycan be expressed using the regularized incomplete beta function:
where and .
In majority voting with workers, each worker has an expected probability of success , the probability of miss classification at FC can be expressed as:
Note that now can be any real value . Taking partial derivative of with respect to yields
From (9) to (10), we use the symmetry of with respect to , and the fact . Finally, notice that in the interval , and , thus and is strictly negative. Since , it follows that is decreasing with respect to . Besides, as increases, the magnitude of strictly decreases because and . Thus, the magnitude of derivative decreases as increases.