Working in Pairs: Understanding the Effects of Worker Interactions in Crowdwork

10/23/2018 ∙ by Chien-Ju Ho, et al. ∙ Washington University in St Louis 0

Crowdsourcing has gained popularity as a tool to harness human brain power to help solve problems that are difficult for computers. Previous work in crowdsourcing often assumes that workers complete crowdwork independently. In this paper, we relax the independent property of crowdwork and explore how introducing direct, synchronous, and free-style interactions between workers would affect crowdwork. In particular, motivated by the concept of peer instruction in educational settings, we study the effects of peer communication in crowdsourcing environments. In the crowdsourcing setting with peer communication, pairs of workers are asked to complete the same task together by first generating their initial answers to the task independently and then freely discussing the tasks with each other and updating their answers after the discussion. We experimentally examine the effects of peer communication in crowdwork on various common types of tasks on crowdsourcing platforms, including image labeling, optical character recognition (OCR), audio transcription, and nutrition analysis. Our experiment results show that the work quality is significantly improved in tasks with peer communication compared to tasks where workers complete the work independently. However, participating in tasks with peer communication has limited effects on influencing worker's independent performance in tasks of the same type in the future.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Crowdsourcing is a paradigm for utilizing human intelligence to help solve problems that computers alone can not yet solve. In recent years, crowdsourcing has gained increasing popularity as the Internet makes it easy to engage the crowd to work together. On a typical crowdsourcing platform like Amazon Mechanical Turk (MTurk), task requesters may post “microtasks” that workers can complete independently in a few minutes in exchange for a small amount of payment. A microtask might involve labeling an image, transcribing an audio clip, or determining whether a website is offensive. Much of the practice and research in crowdsourcing has made this independence assumption and has focused on designing effective aggregation methods [35, 7, 40] or incentive mechanisms [30, 16] to improve the quality of crowdwork.

More recently, researchers have started to explore the possibility of removing this independence assumption and enable worker collaboration in crowdsourcing. One typical approach is to design workflows that coordinate crowd workers for solving complex tasks. Specifically, a workflow involves decomposing a complex task into multiple simple microtasks, and workers are then asked to work on different microtasks. Since decomposed microtasks may depend on each other (e.g., the output of one task may be used as the input for another), workers are implicitly interacting with one another and are not working independently. Along this line, there has been a great amount of research demonstrating that relaxing the worker independence assumption could enable us to go beyond microtasks and solve various complex tasks using crowdsourcing [4, 24, 25, 36].

Another line of research has demonstrated that even when workers are working on the same microtask, enabling some form of structured interactions between workers could be beneficial as well. In particular,  Drapeau et al. [13] and  Chang et al. [6] has shown that, in labeling tasks, if workers are presented with alternative answers and the associated arguments, which are generated by other workers working on the same tasks, they can provide labels with higher accuracy. These results, again, imply that including worker interactions could have positive impacts on crowdwork.

In both these lines of research, however, interactions between workers are indirect and constrained by the particular format of information exchange that is pre-defined by requesters (e.g., the input-output handoffs in workflows, the elicitation of arguments for workers’ answers). Such form of worker interactions can be context-specific and may not be easily adapted to different contexts. For example, it is unclear whether presenting alternative answers and arguments would still improve worker performance for tasks other than labeling, where it can be hard for workers to provide a simple justification for their answers.

Naturally, one may ask what if we can introduce direct, synchronous and free-style worker interactions in crowdwork? We refer to this alternative type of worker interactions as peer communication, and in this paper, we focus on understanding the effects of peer communication in crowdwork when a pair of workers are working on the same microtask. In particular, inspired by the concept of peer instruction in educational settings [8], we operationalize peer communication as a procedure where a pair of workers working on the same task are asked to first provide an independent answer each, then freely discuss the task, and finally provide an updated answer after the discussion. We ask the following two questions to understand the effects of peer communication:

  • Whether and why peer communication improves the quality of crowdwork?
    Empirical study on the effects of peer instruction suggests that students are more likely to provide correct answers to test questions after discussing with their peers [8]. Moreover, previous work in crowdsourcing also demonstrates that indirect worker interactions (e.g., showing workers the arguments from other workers) [13, 6] improve the quality of crowdwork for labeling tasks. We are thus interested in exploring whether peer communication, a more general form of worker interactions, could also have positive impacts on the quality of crowdwork for a more diverse set of tasks.

  • Can peer communication be used to train workers so that workers can achieve better independent performance on the same type of tasks in the future?
    It is observed that students learning with peer instruction obtain higher grades when they (independently) take the post-tests at the the end of the semester [8]. Moreover, previous work in crowdsourcing also shows that some types of indirect worker interactions (e.g., asking workers to review or verify the results of other workers in the same type of task) could enhance workers’ independent performance for similar tasks in the future [12, 42]. We are thus interested in examining whether peer communication could also be an effective approach to train workers.

We design and conduct experiments on Amazon Mechanical Turk to answer these questions. In our first set of experiments, we examine the effects of peer communication with three of the mostly commonly seen tasks in crowdsourcing markets: image labeling, optimal character recognition, and audio transcriptions. Experiment results show that workers in tasks with peer communication perform significantly better than workers who work independently. The results are robust and consistently observed for all three types of tasks. By looking into the logs of worker discussion, we find that most workers are engaged in constructive conversations and exchanging information that their peer might not notice or do not know. This observation reinforces our beliefs that consistent quality improvements can be obtained through introducing peer communication in crowdwork. However, unlike in the educational setting, workers who have completed tasks with peer communication do not produce independent work of higher quality on the same type of tasks in the future.

We then conduct a second set of experiments with nutrition analysis tasks to examine the effects of peer communication in training workers for future tasks in more depth. The experiment results suggest that workers’ independent performance in future tasks only improves when the future tasks share related concepts to the training tasks (i.e., the tasks where peer communication happens), and when workers are given expert feedback after peer communication. Moreover, such improvement is likely caused by expert feedback rather than the peer communication procedure. In other words, we find that peer communication, per se, may have limited effectiveness in training workers towards better independent performance, at least for microtasks in crowdsourcing.

Our current study focuses on one-to-one communication between workers on microtasks. We believe our results provide implications for the potential benefits of introducing direct interactions among multiple workers in complex and more general tasks, and we hope more experimental research will be conducted in the future to carefully understand the effects of peer communication in various crowdsourcing contexts.

1.1 Related Work

A major line of research in crowdsourcing is to design effective quality control methods. Most of the work in this line has made the assumption that workers independently complete the tasks. One theme in the quality control literature is the development of statistical inference and probabilistic modeling methods for the purpose of aggregating workers’ answers. Assuming a batch of noisy inputs, the EM algorithm [11]

can be adopted to learn the skill level of workers and obtain estimates of the best answer 

[35, 7, 20, 40, 10]. There have also been extensions to also consider task assignments in the context of these probabilistic models of workers [21, 22, 15]. Another theme is to design incentive mechanisms to motivate workers to contribute high-quality work. Incentives that researchers have studied include monetary payments [30, 17, 41, 16] and intrinsic motivation [28, 37, 39]. In addition, gamification [1], badges [2], and virtual points [18] are also explored to steer workers’ behavior.

The goal of this work is to explore the effects of worker interactions in crowdsourcing environments. Researchers has explored implicit worker interactions through the design of workflows to coordinate multiple workers. In a workflow, a task is decomposed into multiple microtasks, which often depend on each other, e.g., the output of one microtask is served as the input for another microtask. As a result, workers are implicitly interacting with each other. For example, Little et al. [29] propose the Improve-and-Vote workflow, in which some workers are working on improving the current answer while other workers can vote on whether the updated answers are better than the original ones. Dai et al. [9]

apply partially-observable Markov Decision Process (POMDP) to better set the parameters in these workflows (e.g., how many workers should be involved in voting). More complicated workflows have also been proposed to solve complex tasks using a team of crowdworkers 

[33, 24, 25, 36]. These workflow-based approaches enable crowdsourcing to solve not only microtasks but also more complex tasks. However, worker interactions in this approach are often implicit and constrained (e.g., through the input-output handoffs). We are interested in studying the effects of direct and free-style communications between workers in crowdwork.

Our work focuses on worker interactions when workers work on the same microtask and is related to that of Drapeau et al. [13] and Chang et al. [6]. Drapeau et al. [13] propose an Assess-Justify-Reconsider workflow for labeling tasks: given a labeling task, workers first assess the task and give their answers independently; workers are then asked to come up with arguments to justify their answers; finally, workers are presented with arguments from a different answer and are then asked to reconsider their answers. They show that applying this workflow greatly improves the quality of answers generated by crowd workers. Chang et al. [6] also propose a similar Vote-Explain-Categorize workflow with an additional goal of collecting useful arguments as the labeling guidelines for future workers. Both these studies have relaxed the independence assumption and enabled worker interactions through presenting the arguments from another worker. However, they focus only on classification tasks (e.g., answering whether there is a cat in the image), and the worker interactions are limited to presenting arguments from another worker. In this work, we are interested in enabling more general form of interactions (i.e., direct, synchronous, and free-style communications) for more diverse types of tasks. In particular, in addition to classification tasks, we have explored worker interactions on optical character recognition and audio transcription. It is not trivial how the above two workflows can be applied in these tasks, as workers might not know how to generate arguments for these tasks without interacting with fellow workers in real time.

Regarding the role of worker interactions in “training” workers, previous research [42, 12] suggests that, for complex tasks, introducing limited form of implicit worker interactions, e.g., providing (expert or peer) feedback to workers after they complete the tasks or asking workers to review or verify the work produced by other workers, could improve workers’ performance in the future. In this work, we focus on examining whether direct, synchronous, and free-style communication (instead of one-directional feedback or reviewing) can be an effective training method to improve workers’ independent performance in microtasks.

Niculae and Danescu-Niculescu-Mizil [32]

has explored whether interactions can help improve workers’ output. They designed an online game in which online players can discuss together to identify the location where a given photo is taken. However, their focus is on applying natural language processing techniques to predict whether a discussion would be constructive based on analyzing users’ chat logs. Their results could be useful and interesting to apply in our setting.

This work adopts the techniques from peer instruction, which is a widely adopted interactive learning approach in many institutions and disciplines [8, 14, 26, 34, 31], and has been empirically shown to better engage students and also help students achieve better learning performance. We will provide more details on the concept of peer instruction in the next section.

2 Peer Communication in Crowdwork

In this section, we give a brief introduction to the concept of peer instruction. We then describe our approach of peer communication, which adapts peer instruction to crowdsourcing environments.

2.1 Peer Instruction in Educational Settings

Peer instruction is an interactive learning method developed by Eric Mazur which aims at engaging students for more effective learning during classes. Different from traditional teaching methods, which is typically centered around the instructors as the instructors convey knowledge to students in a one-sided manner through pure lectures, peer instruction creates a student-centered learning environment where students can instruct and learn from each other.

Figure 1: Questioning procedure of peer instruction.

More specifically, peer instruction involves students first learning outside of class by completing pre-class readings and then learning in class by engaging in the conceptual question answering process. Figure 1 summarizes the in-class questioning procedure of peer instruction. Such procedure starts with the instructor proposing to students a question that is related to one concept in the pre-class readings. Students are then asked to reflect on the question, formulate answers on their own, and report their answers to the instructor. Next, students can discuss the question with their fellow students333In practice, if most of the students answer the question correctly on their own, the instructor can decide to skip the discussion phase and move on to the next concept.. During the discussion, students are encouraged to articulate the underlying reasoning of their answers and convince their peers that their answers are correct. After the discussion, each student reports to the instructor a (final) updated answer, which may or may not be different from her initial answer before the discussion. Lastly, after reviewing students’ final responses, the instructor decides to either provide more explanation on the concept associated with the current question or move on to the next concept.

The peer instruction method has been widely adopted in a large number of institutions and disciplines [8, 14, 26]. Intuitively, peer instruction may improve learning as students become active knowledge producers instead of passive knowledge consumers. Compared to instructors, students might be able to provide explanations that are better understood by other students as they share similar backgrounds. Empirical observations for deploying peer instruction confirm that it successfully improves students’ learning performance [8]. In particular, students are more likely to provide correct answers to the conceptual question after discussing peers than they do before the discussion. Moreover, in post-tests where students independently answer a set of test questions after the end of the semester, students who participate in courses taught with peer instruction perform significantly better than students who don’t. These empirical evidences suggest that peer instruction helps students understand not only the current question but also the underlying concepts, which eventually help them obtain better independent performance in future tests.

2.2 Peer Communication: Adapting Peer Instruction to Crowdwork

We propose to study the effects of peer communication in crowdwork, applying the idea of peer instruction as a principled approach to structure the direct interactions among crowd workers. In particular, given a particular microtask, we consider the requester of it as the “instructor,” all workers working on it as “students,” and the task per se as the “conceptual question” proposed by the instructor. Hence, a natural way to adapt peer instruction to crowdsourcing would be asking each worker working on the same task to first complete the task independently and then allowing them to discuss the task with each other and submitting their final answers.

The success of peer instruction in improving students’ learning performance in the educational domain implies the possibility of using such strategy to enhance the quality of work in crowdsourcing, both on tasks where peer communication takes place and on future tasks of the same type. However, it is unclear whether the empirical findings on the effects of peer instruction in educational settings can be directly generalized to the crowdsourcing domain. For example, while conceptual questions in educational settings typically involve problems that require specialized knowledge or domain expertise, crowdwork is often composed of “microtasks” that only ask for simple skills or basic intelligence. Moreover, in peer instruction, the instructor can provide additional explanations to clarify confusions students might have during the discussion. However, in peer communication, requesters often do not know the ground truth of the tasks and are not able to provide additional feedback after workers submit their tasks.

Therefore, in this work, we aim to examine the effects of peer communication in crowdsourcing, and in particular, whether peer communication has positive effects on the quality of crowdwork. More specifically, based on the empirical evidence on the effectiveness of peer instruction in education as well as the positive effects of indirect worker interactions demonstrated in previous research, we postulate two hypotheses on the effects of peer communication in crowdwork:

  • Hypothesis 1 (H1): Workers can produce higher work quality in tasks with peer communication than that in tasks where they work independently.

  • Hypothesis 2 (H2): After peer communication, workers are able to produce independent work of higher quality on the same type of tasks in the future.

We design and conduct a series of large-scale online experiments to test these two hypotheses. In our experiments, we operationalize the procedure of peer communication between pairs of workers who are working on the same microtask, and we leave the examination of the effects of peer communication in larger groups of crowd workers on more complex tasks as future work. It is also worthwhile to note that, in this work, we focus on adapting the component of worker interactions in peer instruction (i.e., the yellow-shaded boxes in Figure 1). However, in practice, the requester can often make interventions to improve the efficiency of the peer communication process (e.g., given the initial answers submitted before discussion, the requester can dynamically decide how to present information or match workers to make the discussions more effective). The study of requester interventions in peer communication is out of the scope of the current paper, but it is another direction that worths further research.

3 Experiment 1: How Does Peer Communication Affect Quality of Crowdwork?

To examine how introducing direct communication between pairs of workers in crowdwork affects the work quality, we design and conduct a set of online experiments on Amazon Mechanical Turk (MTurk) with three types of microtasks that commonly appear on crowdsourcing platforms, including image labeling, optical character recognition (OCR), and audio transcription.

3.1 Independent Tasks vs. Discussion Tasks

As previously stated in our hypotheses, we are interested in understanding whether allowing workers to work in pairs and directly communicate with each other about the same tasks would lead to work of better quality compared to that when workers complete the work independently, both on tasks where peer communication happens (H1) and on future tasks of the same type after peer communication takes places (H2). To do so, in our experiments, we consider both tasks with peer communication and tasks without peer communication:

  • Independent tasks (tasks without peer communication). In an independent task, workers are instructed to complete the task on their own.

  • Discussion tasks (tasks with peer communication). Workers in a discussion task are guided to communicate with other workers to complete the task together, following a process adapted from the peer instruction procedure as we have discussed in Section 2.2. In particular, each worker is paired with another “co-worker” on a discussion task. Both workers in the pair are first asked to work on the task independently and submit their independent answers. Then, the pair of workers enter a chat room, where they can see each other’s independent answer to the task, and they are given two minutes to discuss the task freely. Workers are instructed to explain to each other why they believe their answers are correct. After the discussion, both workers get the opportunity to update their answers and submit their final answers.

Figure 2: The two treatments used in our experiments. This design enables us to examine Hypothesis 1 (through comparing work quality in Session 1) and Hypothesis 2 (through comparing work quality in Session 2), while not creating significant differences between the two treatments (through adding Session 3 to make the two treatments containing equal number of independent tasks and discussion tasks).

3.2 Treatments

We conduct randomized experiments to examine our hypotheses regarding the effects of peer communication on the quality of crowdwork. The most straight-forward experimental design would include two treatments, where workers in one treatment are asked to work on a sequence of independent tasks while workers in the other treatment complete a sequence of discussion tasks. However, since the structure of independent tasks and discussion tasks are fundamentally different—discussion tasks naturally require more time and effort from workers but can be more interesting to workers—it is possible for us to observe significant self-selection biases in the experiments (i.e., workers may self-select into the treatment that they can complete tasks faster or find more enjoyable) if we adopt such a design.

To overcome the drawback of this simple design, we design our experimental treatments in a way that each treatment consists of the same number of independent tasks and discussion tasks, such that neither treatment appears to be obviously more time-consuming or enjoyable. Figure 2 illustrates the two treatments used in our experiments. In particular, we bundle 6 tasks in each HIT444HIT stands for Human Intelligence Task, and it refers to one unit of job on MTurk that a worker can accept to work on.. When a worker accepts our HIT, she is told that there are 4 independent tasks and 2 discussion tasks in the HIT. There are two treatments in our experiments: in Treatment 1, workers are asked to complete 4 independent tasks followed by 2 discussion tasks, while workers in Treatment 2 first work on 2 discussion tasks and then complete 4 independent tasks. Importantly, we do not tell workers the ordering of the 6 tasks, which helps us to minimize the self-selection biases as the two treatments look the same to workers. We refer to the first, middle, and last two tasks in the sequence as Session 1, 2, 3 of the HIT, respectively.

Given the way we design the treatments, we can examine H1 by comparing the work quality produced in Session 1 (i.e. the first two tasks of the HIT) between the two treatments. Intuitively, observing higher work quality in Session 1 of Treatment 2 would imply that peer communication can enhance work quality above the level of independent worker performance. Similarly, we can test H2 by comparing the work quality in Session 2 (i.e. the middle two tasks of the HIT) between the two treatments. H2 is supported if the work quality in Session 2 of Treatment 2 is also higher than that of Treatment 1, which would suggest that after communicating with peers, workers are able to produce higher quality in their independent work for the same type of tasks. Finally, Session 3 (i.e. the last two tasks of the HIT) is used to ensure that the two treatments require similar amount of work from workers.

3.3 Experimental Tasks

We conduct our experiments on three types of tasks: image labeling, optical character recognition (OCR), and audio transcription. These tasks are all very common types of tasks on crowdsourcing platforms, hence experimental results on these tasks allow us to understand how peer communication affects the quality of crowdwork for various kinds of typical tasks.

  • Image labeling. In each image labeling task, we present one image to the worker and ask her to identify whether the dog in the image is a Siberian Husky or a Malamute. Dog images we use are collected from the Stanford Dogs dataset [23]. Since the task can be difficult for workers who are not familiar with dog species, we provide workers with a table summarizing the characteristics of each dog species, as shown in Figure 3. Workers can get access to this table at anytime when working on the HIT.

  • Optical character recognition (OCR). For the OCR task, workers are asked to transcribe vehicles’ license plate numbers from photos. The photos are taken from the dataset provided by Shah and Zhou [38], and some examples are shown in Figure 4.

  • Audio transcription. For the audio transcription task, workers are asked to transcribe an audio clip which contains approximately 5 seconds of speech. The audio clips are collected from VoxForge555http://www.voxforge.org.

Figure 3: The instruction of the image labeling task.
Figure 4: Examples of photos used in the OCR task.

Unlike in the image labeling task, we do not provide additional instructions for the OCR and audio transcription tasks. Indeed, for some types of crowdwork, it is difficult for requesters to provide detailed instructions. However, the existence of detailed task instruction may influence the effectiveness of peer communication (e.g., workers in the image labeling tasks can simply discuss with their co-workers whether each distinguishing feature covered in the instruction is presented in the dog image). Thus, examining the effect of peer communication on work quality for different types of tasks, where detailed instruction may or may not be possible, helps us to understand whether such effect is dependent on particular elements in the design of the tasks.

3.4 Experimental Procedure

Introducing direct communication between pairs of workers on the same tasks requires us to synchronize the work pace of pairs of workers, which is quite challenging as discussed in previous research on real-time crowdsourcing [4, 3]. We address this challenge by dynamically matching workers together and sending pairs of workers to simultaneously start working on the same sequence of tasks.

In particular, when each worker arrives at our HIT, we first check whether there is another worker in our HIT who don’t have a co-worker yet — if yes, she will be matched to that worker and assigned to the same treatment and task sequence as that worker. Otherwise, the worker will be randomly assigned to one of the two treatments as well as a random sequence of tasks, and she will be asked to wait for another co-worker to join the HIT for a maximum of 3 minutes. We will prompt the worker with a beep sound if another worker indeed arrives at our HIT during this 3-minute waiting period. Once we successfully match a pair of workers, both of them will be automatically redirected to the first task in the HIT and they can start working on the HIT simultaneously. In the case where no other workers arrives at our HIT within 3 minutes, we ask the worker to decide whether she is willing to complete all tasks in the HIT on her own (and we will drop the data for the analysis but still pay her accordingly) or keep waiting for another 3 minutes and receive a compensation of 5 cents for waiting.

For all types of tasks, we provide a base payment of 60 cents for the HIT. In addition to the base payments, workers are provided with the opportunity to earn performance-based bonuses, that is, workers can earn a bonus of 10 cents in a task if the final answer they submit for that task is correct. Our experiment HITs are open to U.S. workers only, and each worker is only allowed to take one HIT for each type of tasks.

3.5 Experimental Results

For the image labeling, OCR, and audio transcription tasks, we obtain data from 388, 382, and 250 workers through our experiments, respectively666

We have targeted to recruit around 200 workers for each treatment, leading to about 400 workers for each experiment. However, we have encountered difficulties reaching workers of the targeted size for the audio transcription tasks, probably because workers consider the payment to be not high enough for audio transcription tasks (we fix the payment magnitude to be the same across the three types of tasks).

. We then examine Hypothesis 1 and 2 separately for each type of task by analyzing experimental data collected from Session 1 and 2 in the HIT, respectively. It is important to note that in the experimental design phase, we have decided not to include data collected from Session 3 of the HIT into our formal analyses. This is because workers in Session 3 of the two treatments differ to each other both in terms of whether they have communicated with other workers about the work in previous tasks and whether they can communicate with other workers in the current tasks, making it difficult to draw any causal conclusions on the effect of peer communication. However, as we will mention below, analyzing the data collected in Session 3 leads to observations that are consistent with our findings.

3.5.1 Work Quality Metrics

We evaluate the work quality using the notion of error. Specifically, in the image labeling task, since workers can only submit binary labels (i.e., Siberian Husky or Malamute), the error is defined as the binary classification error—if a worker provides a correct label, the error is , otherwise the error is . For OCR and audio transcription tasks, since workers’ answers and the ground truth answers are both strings, we define “error” as the edit distance between the worker’s answer and the ground truth, divided by the number of characters in the ground truth. Naturally, in all types of tasks, a lower rate of error implies higher work quality.

Figure 5: Examine whether workers produce higher work quality in tasks with peer communication than in tasks without peer communication. In “Independent” group, we calculate workers’ average error rate in Session 1 of Treatment 1 (see Figure 2). In “Peer Communication (Before Discussion)” group, we calculate workers’ average error rate in Session 1 of Treatment 2, before they communicate with co-workers about the work (i.e., for their independent answers). In “Peer Communication (After Discussion)” group, we calculate workers’ average error rate in Session 1 of Treatment 2, after they communicate with co-workers about the work (i.e., for their final answers). Error bars indicate the mean

one standard error.

3.5.2 Work Quality Improves in Tasks with Peer Communication

We start with examining Hypothesis 1 by comparing work quality produced in Session 1 of the two treatments for each type of tasks. In Figure 5, We plot the average error rate for workers’ final

answers in Session 1 of Treatment 1 and 2 using white and black bars, respectively. Visually, it is clear that for all three types of tasks, the work quality is higher after workers communicate with others about the work compared to when workers need to complete the work on their own. We further conduct two-sample t-tests to check the statistical significance of the differences, and p-values for image labeling, OCR and audio transcription tasks are

, , and respectively, suggesting the improvement in work quality is statistically significant. Our experimental results thus support Hypothesis 1.

Our consistent observations on the effectiveness of peer communication in enhancing the quality of crowdwork for various types of tasks indicate that enabling direct, synchronous and free-style communications between pairs of workers who work on the same tasks might be a simple method for improving worker performance that can be easily adapted to different contexts. To further highlight the advantage of peer communication, we apply majority voting to aggregate the labels obtained during Session 1 of the image labeling tasks777Since there is no straight-forward way to aggregate workers’ answers in the other two types of tasks, we only perform the aggregation for image labeling tasks., and the results are presented in Figure 6. The X-axis represents the number of workers from whom we elicit labels for each image, and the Y-axis represents the prediction error (averaged across all images) of the aggregate label decided by the majority voting rule. As we can see in the figure, the aggregation error using labels obtained from tasks with peer communication greatly outperforms the aggregation error using labels from independent work. Moreover, in independent tasks, a majority of workers provide incorrect labels for approximately 20% of the images (therefore, the prediction error converges to near 20%) while in tasks with peer communication, this aggregated error reduces to only around 10%. These results reaffirm the superior quality of data collected through tasks with peer communication.

Figure 6: The aggregation error for the image labeling task after using majority voting for aggregation.

A natural question one may ask then is why work quality improves in tasks with peer communication. One possible contributing factor is the social pressure, that is, workers may put more effort and thus produce higher work quality in the tasks, simply because they know that they are working with a co-worker on the same task and are going to discuss with the co-worker about the task. Another possibility is that constructive conversations between workers enable effective knowledge sharing and lead to the improvement in work quality. To get a better understanding on the role of these two factors on influencing work quality in tasks with peer communication, we conduct a few additional analyses.

The impacts of social pressure.

First, we look into whether workers behave differently when they are working on their independent answers in tasks with peer communication and when they are working on tasks without peer communication. Intuitively, if workers are affected by social pressure in tasks with peer communication, they may spend more time on the tasks and possibly produce work of higher quality even at the stage when they are asked to work on the tasks on their own before communicating with their co-workers. Table 1 summarizes the amount of time workers spend on tasks in Session 1 of Treatment 1, and on Session 1 of Treatment 2 when they work on their independent answers. We find that, overall, knowing the existence of a co-worker who works on the same task makes workers spend more time on the task on their own, though the differences are not always significant. In addition, we plot the average error rate for workers’ independent answers in Session 1 of Treatment 2 as gray bars in Figure 5. Comparing the white and gray bars in Figure 5, we find that workers only improve the quality of their independent answers significantly in the audio transcription tasks when they know the existence of a co-worker (). Together, these results imply that workers in tasks with peer communication might be affected by the social pressure to some degree, but social pressure is likely not the major cause of the work quality improvement in tasks with peer communication.

Task Type Treatment 1 Treatment 2 (before discussion) p-values
Image labeling 16.34 (0.78) 21.43 (0.97) 4.165
OCR 23.63 (1.00) 26.13 (1.13) 0.099
Audio transcription 55.91 (3.38) 59.90 (2.62) 0.353
Table 1: Comparison of the average amount of time (in seconds) workers spend on independently working on a task in Session 1 of Treatment 1 and 2; mean values and standard errors (in parentheses) are reported. Two-sample t-tests are used to examine whether the differences are statistically significant, and p-values are reported in the last column.
Figure 7: Examples of chat logs.
The impacts of constructive conversations.

Next, we examine whether the conversations between co-workers, by itself, help workers in tasks with peer communication to improve their work quality. We thus compare the quality of workers’ independent answers before discussion (gray bars in Figure 5) and their final answers after discussion (black bars in Figure 5) in Session 1 of Treatment 2. We find that workers in tasks with peer communication submit final answers of higher quality after discussion than their independent answers before discussion. We further conduct paired t-tests on worker’s error rate before and after discussion for tasks in Session 1 of Treatment 2, and test results show that the difference is statistically significant for all three types of tasks ( and for image labeling, OCR, and audio transcription tasks, respectively). In fact, we can also reach the same conclusion if we conduct a similar analysis for the work quality produced before and after discussions in Session 3 (i.e., the last two tasks) of Treatment 1. That is to say, the communication between co-workers about the same piece of work consistently leads to a significant improvement in work quality.

To gain some insights on what workers have communicated with each other during the discussion, we show a few representative examples of chat logs in Figure 7. We find that workers are mostly engaged in constructive conversations in the discussions. In particular, workers not only try to explain to each other the reasons why they come up with their independent answers and deliberate on whose answer is more convincing (as shown in the example for the image labeling tasks), but they also try to jointly work on the tasks together (as shown in the example for the OCR tasks). Throughout the conversations, workers communicate on their confidence about their answer (e.g., “I’m not sure after that…”) as well as their strategies for solving the tasks (e.g., “he pronounces ‘was’ with a v-sound instead of the w-sound”). Note that much of the discussions as shown in Figure 7 can hardly be possible without allowing workers to directly interact and exchange information with each other in real time, which implies the necessity of direct, synchronous, free-style interactions between workers in crowdwork.

To briefly summarize, we have consistently found that enabling peer communication among pairs of workers can enhance the work quality for various types of tasks above the level of independent worker performance, which can be partly attributed to the social pressure brought up by the peer communication process, but is mostly due to the constructive conversations between workers about the work. These results indicate that introducing peer communication in crowdwork can be a simple, generalizable approach to enhance work quality.

3.5.3 Effects of Peer Communication on Work Quality in Future Tasks

We now move on to examine our Hypothesis 2: compared to workers who have never been involved in peer communication, do workers who have participated in tasks with peer communication continue to produce work of higher quality in future tasks of the same type, even if they need to complete those tasks on their own? In other words, is there any “spillover effect” of peer communication on the quality of crowdwork, such that peer communication can be used as a “training” method to enhance workers’ independent work quality in the future?

To answer this question, we compare the work quality produced in Session 2 (i.e., the middle two independent tasks) of the two treatments for all three types of tasks, and results are shown in Figure 8. As we can see in the figure, there are no significant differences in work quality between treatments, indicating that after participating in tasks with peer communication, workers are not able to maintain a higher level of quality when they complete tasks of the same type on their own in the future. Therefore, our observations in Session 2 of the two treatments do not support Hypothesis 2. To fully understand whether and when Hypothesis 2 can be supported, we continue to conduct a set of follow-up experiments, which we will describe in detail in the next section.

Figure 8: Examine whether the work quality in future tasks of the same type increases after workers participating in tasks with peer communication. In “No Peer Communication” group, we calculate workers’ average error rate in Session 2 of Treatment 1. In “After Peer Communication” group, we calculate workers’ average error rate in Session 2 of Treatment 2. Error bars indicate the mean one standard error.

4 Experiment 2: When Does Peer Communication Affect Quality of Independent Work in Future Tasks?

The results of our previous experiment do not support Hypothesis 2, i.e., after participating in tasks with peer communication, workers do not produce work of higher quality in tasks of the same type when working independently. This is in contrast with the empirical findings of peer instruction in educational setting, despite that the procedure of peer communication is adapted from peer instruction. We conjecture that two factors may have contributed to this observed difference.

First, for peer instruction, concepts covered in the post-tests (e.g., when students answer the test questions on their own after the instruction ends) are often the same as concepts discussed during the peer instruction process in class. Therefore, knowledge learned from the peer instruction process can be directly transferred to post-tests. This is not necessarily true for peer communication in crowdwork—for example, when workers are asked to complete a sequence of tasks to identify Siberian Husky and Malamute, it is possible that the distinguishing feature for the dog in one task is its eyes while the distinguishing feature for the dog in another task is its size, making the knowledge that workers possibly have learned in tasks with peer communication not always useful on future tasks that are somewhat unrelated.

In addition, as we have discussed in Section 2.2, compared to the standard peer instruction procedure, we remove the last step (see Figure 1) where the requester provides expert feedback to workers after reviewing workers’ final answers in the peer communication process888This step would be equivalent to instructor reviewing students’ final responses and providing more explanation as needed in peer instruction. due to the low availability of expert feedback. It is thus possible that worker’s quality improvement in future independent work can only be obtained when additional expert feedback is provided after peer communication.

Therefore, in this section, we conduct an additional set of experiments to examine whether these two factors have impacts on the effectiveness of peer communication as a tool for training workers, and thus seek for a better understanding on whether and when peer communication can affect workers’ independent performance in the future.

4.1 Experimental Tasks

In this study, we use nutrition analysis tasks, provided in the work by  Burgermaster et al. [5], in the experiments. In each nutrition analysis task, we present a pair of photographs of mixed-ingredient meals to workers. Workers are asked to identify which meal in the pair contains a higher amount of a specific macronutrient (i.e., carbohydrate, fat, protein, etc.). To help workers figure out the main ingredients of the meals in each photograph, we also attach a textual description with each photograph. Figure 8(a) shows an example of a nutrition analysis task.

We choose to use the nutrition analysis tasks for two reasons. First, each nutrition analysis task is associated with a “topic,” which is the key concept underlying the task. For example, the topic for the task shown in Figure 8(a) is that nuts (contained in peanut butter) are important sources of proteins. Knowing the topic of each task, we can then place tasks of the same topic subsequently and examine whether, after participating in tasks with peer communication, workers improve their independent work quality on related tasks (i.e., tasks that share the same underlying concept). Second, we have access to expert explanations for each nutrition analysis task (see Figure 8(b) for an example), which allows us to test whether peer communication has to be combined with expert feedback to influence worker’s independent performance in future tasks.

We would like to note that the underlying concepts and expert feedback are often hard to obtain in crowdsourcing tasks, since requesters do not know the ground truth. The purpose of this follow-up study is to provide us better insights on under what conditions can peer communication be an effective tool for training workers.

(a) Example of a nutrition analysis task
(b) Expert feedback for the above nutrition analysis task
Figure 9: Example of a nutrition analysis task and the expert explanation associated with it.

4.2 Experimental Design

We explore whether peer communication can be used to train workers when combined with the two factors we discuss above: whether tasks are conceptually similar (i.e., whether future tasks are related to tasks where peer communication is enabled), and whether expert feedback is provided at the end of peer communication.

In particular, we aim to answer whether peer communication is effective in training workers when (a) tasks are conceptually similar but no expert feedback is given at the end of peer communication, (b) tasks are not conceptually similar but expert feedback is given at the end of peer communication, or (c) tasks are conceptually similar and expert feedback is given at the end of peer communication. For both the second and the third question, if the answer is positive, a natural question then is whether the improvement on independent work quality in future tasks is attributed to the expert feedback or the peer communication procedure.

Corresponding to these three questions, we design three sets of experiments. All the experiments share the same structure as the experiments we have designed in the previous section. That is, we include two treatments in each experiment, where Treatment 1 contains 4 independent nutrition analysis tasks followed by 2 discussion tasks and Treatment 2 contains 2 discussion tasks followed by 4 independent tasks. We highlight the differences in the design of these three experiments in the following.

4.2.1 Experiment 2a

Different from that in experiments of Section 3, in this experiment, tasks within a HIT are not randomly selected. Instead, for both treatments in this experiment, tasks in Session 1 are randomly picked, while tasks in Session 2 are selected from the ones that share the same topics as tasks in Session 1. This experiment is designed to understand whether peer communication can lead to better independent work quality in related tasks in the future. Naturally, if workers are able to achieve better performance in Session 2 of Treatment 2 compared to that in Session 2 of Treatment 1, we may conclude that workers can improve their independent work quality after participating in tasks with peer communication, but only for those related tasks that share similar concepts as the tasks that they have discussed with other workers.

4.2.2 Experiment 2b

The main difference between this experiment and experiments of Section 3 is the presence of expert feedback. Specifically, for all discussion tasks in this experiment, after workers submit their final answers, we will display extra information to workers which includes a feedback on whether the worker’s final answer is correct, and an expert explanation on why the worker’s answer is correct or wrong (see Figure 8(b) for an example). Workers are asked to spend at least 15 seconds reading this information before they can proceed to the next page in the HIT. Note that in this experiment, the tasks included in a HIT are randomly selected. Comparing worker’s performance in Session 2 of the two treatments in this experiment, thus, inform us on whether the addition of expert feedback at the end of the peer communication procedure will lead to improvement on independent work quality in future tasks, which may or may not be related to the tasks for which peer communication is enabled.

4.2.3 Experiment 2c

Our final experiment is the same as Experiment 2b, except for one small difference—tasks included in Session 2 have the same topics as tasks in Session 1. This experiment then allows us to understand whether workers’ independent work quality improves after they participate in tasks with peer communication, when expert feedback is combined with peer communication and future tasks are related to the tasks with peer communication.

Our experiments are open to U.S. workers only, and each worker is allowed to participate in only one experiment.

4.3 Experimental Results

In total, 386, 432 and 334 workers have participated in Experiments 2a, 2b and 2c, respectively. Figure 10 shows the results on the comparison of work quality in the two treatments for all three experiments.

First, we notice that in all three experiments, there are significant differences in the work quality between the two treatments for tasks in Session 1, which reaffirms our findings that workers significantly improve their performance in tasks with peer communication compared to when they work on the tasks by themselves (p-values for two-sample t-tests on workers’ error rates in Session 1 are , 0.005, and 0.007 for Experiments 2a, 2b, 2c, respectively).

Furthermore, for tasks in Session 2, we find that workers don’t exhibit much difference in their work quality for tasks in Session 2 between the two treatments in Experiment 2a or 2b (p-values for two-sample t-tests on workers’ error rates in Session 2 are 0.844 and 0.384 for Experiment 2a and 2b, respectively), but there is a significant difference for the work quality in Session 2 between the two treatments in Experiment 2c (). Together with our findings in Section 3.5.3, these results imply that simply enabling direct communication between pairs of workers who work on the same microtasks does not help worker to improve their independent work quality in future tasks, in regardless of whether those future tasks share related concepts to the tasks that they have discussed with co-workers. In addition, simply providing expert feedback after the peer communication procedure can not enhance worker’s future independent performance on some randomly selected tasks of the same type, either. Nevertheless, it seems that peer communication, when combined with expert feedback, can lead to improved independent work quality on future tasks that are conceptually related to the tasks where peer communication takes place.

(a) Experiment 2a results
(b) Experiment 2b results
(c) Experiment 2c results
Figure 10: Comparison of workers’ average error rates in the three sets of experiments in which we examine whether workers improve the quality of independent work after they participate in tasks with peer communication.

One may wonder why peer communication, by itself, can hardly influence workers’ independent work quality in future related tasks (i.e., results of Experiment 2a), but can influence workers’ independent work quality when it is combined with expert feedback (i.e., results of Experiment 2c). We provide two explanations to this. First, by looking into the chat logs, we find that while workers often discover the underlying concept for a task through the peer communication procedure, their understandings on that concept are often context-specific and are not always generalizable to a different context. For example, given two tasks that share the same topic of “nuts are important sources of protein,” a pair of workers might have successfully concluded in one task through discussion that peanut butter in one meal has more protein than banana in the other meal. However, when they are asked to complete the related task on their own, they are facing a choice between a meal with peanut butter and a meal with cream cheese, for which their knowledge about peanut butter that they have learned previously are not entirely applicable. In other words, worker may lack the ability to generalize the transferrable knowledge from concrete tasks through a short period of peer communication (e.g., within 2 microtasks).

Perhaps more importantly, we argue that the improvement in worker’s independent work quality after participating in tasks with peer communication and expert feedback is largely due to the existence of expert feedback, rather than the peer communication procedure. To show this, we conduct a follow-up experiment with the same two-treatment design as Experiment 2c, except that we provide expert feedback on the first two independent tasks of Treatment 1. In this way, comparing the work quality produced in Session 2 of the two treatments, we can understand whether the peer communication procedure provides any additional boost to worker’s independent performance beyond the improvement brought up by expert feedback. Our experimental results on 494 workers give a negative answer—on average, workers’ error rates in Session 2 of Treatment 1 and 2 are 22.7% and 27.3%, respectively, and the difference is not statistically significant ().

Overall, our examinations on the effects of peer communication on workers’ independent work quality in future tasks suggest a limited impact. In other words, peer communication may not be a very effective approach to “train” workers towards a higher level of independent performance, at least when workers are working on microtasks for a relatively short period of time.

5 Discussion

wIn this paper, we have studied the effects of direct interactions between workers in crowdwork, and in particular, we have explored whether introducing peer communication in tasks can enhance work quality in those tasks as well as improving workers’ independent work performance in future tasks of the same type. Our results indicate a robust improvement in work quality when pairs of workers can directly communicate with each other, and such improvement is consistently observed across different types of tasks. On the other hand, we also find that allowing workers to communicate with each other in some tasks may have limited impacts on improving workers’ independent work performance in tasks of the same type in the future.

5.1 Design Implications

Our consistent observations on the improvement of work quality in tasks with peer communication indicate an alternative way of organizing microtasks in crowdwork: instead of having workers solving microtasks independently, practitioners may consider the possibility of systematically organizing crowd workers to work in pairs and enabling direct, synchronous and free-style interactions between pairs of workers to enhance the quality of crowdwork. In some sense, our results suggest the promise and potential benefits of “working in pairs” as a new baseline approach to organize crowdwork. On the other hand, introducing peer communication in crowdwork also creates the complexity for requesters to synchronize the work pace of different workers. Thus, practitioners may need to carefully deliberate on the trade-off between quality improvement brought up by peer communication and extra costs of synchronizing before they implement one specific way to organize their crowdwork.

It is worthwhile to mention that while our experimental results show the advantage of introducing peer communication in crowdwork for many different types of tasks, we can not rule out the possibility that for some specific type of tasks, peer communication may not be helpful or even be harmful. Previous studies have reported phenomenon like groupthink [19] where communication may actually hurt the individual performance. Therefore, more experimental research is needed to thoroughly understand the relationship between the property of tasks and whether peer communication would be helpful for those tasks.

5.2 Limitations

While our results are overall robust and consistent, our specific experimental design and choice of tasks imply a few limitations.

First, our experiments only span for a short period of time (i.e., six microtasks), and workers can only communicate with each other in two microtasks. This short period of interactions could be a bottleneck for workers to really learn the underlying concepts or knowledge that is the key for workers to improve their independent performance. Indeed, in the educational settings, students are often involved in a course that spans for hours or even months, so their improved learning performance in courses with peer instruction could be attributed to repeated exposure to the peer instruction process. In this sense, our observation that peer communication is not an effect tool for training workers could be simply due to this short interactions. Thus, it is an interesting future direction to explore the long term impacts of peer communication for crowdwork.

Moreover, our current experiments focus exclusively on microtasks, thus it is unclear whether our results observed in this study can be generalized to more complex tasks. In particular, many previous work have shown that implicit worker interactions in the form of workers receiving feedback from other workers or reviewing other workers’ output can be an effective method for training workers towards better independent performance. We conjecture that our conclusion on peer communication being not a very effective training method is somewhat limited by the nature of microtasks, and examining the effectiveness of peer communication for more complex tasks is a direction that worths further study.

5.3 Future Work

In addition to many interesting future directions we have discussed above, there are a few more topics that we are particularly interested in exploring in the future.

First, our current study focuses on studying peer communication between pairs of workers in crowdwork. Can we generalize interactions involving more than two workers? How should we deal with additional complexities, such as social loafing [27], when there are more than two workers involved in the communication? It would be interesting to explore the roles of communications in multi-worker collaborations for crowdwork.

Second, in this study, we focused on implementing the component of worker interactions. However, in peer instruction in education, instructor intervention has a big impact on student learning. It is natural to ask, can we further improve the quality of crowdwork through requester intervention? For example, if most workers already agree on an answer, there is a good chance the answer is correct, and therefore the requester can intervene and skip the discussion phase to improve efficiency. In general, can the requester further improve the quality of work by dynamically taking interventions in peer communication process, e.g., by deciding whether a discussion is needed or even modify the pairing of workers based on the previous discussion?

6 Conclusion

In this paper, we have explored how introducing peer communication—direct, synchronous, free-style interactions between workers—affects crowdwork. In particular, we adopt the workflow of peer instruction in educational settings and examine the effects of one-to-one interactions between pairs of workers working on the same microtasks. Experiments on Amazon Mechanical Turk demonstrate that adopting peer communication significantly increases the quality of crowdwork over the level of independent worker performance, and such performance improvement is robust across different types of tasks. On the other hand, we find that participating in tasks with peer communication only leads to improvement in workers’ independent tasks in future tasks of the same type, if expert feedback is provided at the end of the peer communication procedure and future tasks are conceptually related to the tasks where peer communication take places. However, the improvement is likely caused by the expert feedback rather than by peer communication. Overall, these results suggest that peer communication, by itself, may not be an effective method to train workers towards better performance, at least for typical microtasks on crowdsourcing platforms.

Acknowledgments

We thank all the crowd workers who participated in the experiments to make this work possible.

References

  • Ahn [2006] Luis von Ahn. Games with a purpose. Computer, 39(6):92–94, June 2006.
  • Anderson et al. [2013] Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. Steering user behavior with badges. In Proceedings of the 22Nd International Conference on World Wide Web (WWW), 2013.
  • Bernstein et al. [2011] Michael S. Bernstein, Joel Brandt, Robert C. Miller, and David R. Karger. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST), 2011.
  • Bigham et al. [2010] Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, and Tom Yeh. Vizwiz: Nearly real-time answers to visual questions. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology (UIST), 2010.
  • Burgermaster et al. [2017] Marissa Burgermaster, Krzysztof Z Gajos, Patricia Davidson, and Lena Mamykina. The role of explanations in casual observational learning about nutrition. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 4097–4145. ACM, 2017.
  • Chang et al. [2017] Joseph Chee Chang, Saleema Amershi, and Ece Kamar.

    Revolt: Collaborative crowdsourcing for labeling machine learning datasets.

    In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI), 2017.
  • Cholleti et al. [2008] S. R. Cholleti, S. A. Goldman, A. Blum, D. G. Politte, and S. Don. Veritas: Combining expert opinions without labeled data. In

    Proceedings 20th IEEE international Conference on Tools with Artificial intelligence

    , 2008.
  • Crouch and Mazur [2001] Catherine Crouch and Eric Mazur. Peer instruction: Ten years of experience and results. Am. J. Phys., 69(9):970–977, September 2001.
  • Dai et al. [2013] Peng Dai, Christopher H. Lin, Mausam, and Daniel S. Weld. Pomdp-based control of workflows for crowdsourcing. Artif. Intell., 202(1):52–85, September 2013.
  • Dawid and Skene [1979] A. P. Dawid and A. M. Skene. Maximum likeihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 28:20–28, 1979.
  • Dempster et al. [1977] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39:1–38, 1977.
  • Doroudi et al. [2016] Shayan Doroudi, Ece Kamar, Emma Brunskill, and Eric Horvitz. Toward a learning science for complex crowdsourcing tasks. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI), 2016.
  • Drapeau et al. [2016] Ryan Drapeau, Lydia B. Chilton, Jonathan Bragg, and Daniel S. Weld. Microtalk: Using argumentation to improve crowdsourcing accuracy. In Fourth AAAI Conference on Human Computation and Crowdsourcing (HCOMP), 2016.
  • Fagen et al. [2002] Adam P. Fagen, Catherine H. Crouch, and Eric Mazur. Peer instruction: Results from a range of classrooms. The Physics Teacher, 40(4):206–209, 2002.
  • Ho et al. [2013] C. Ho, S. Jabbari, and J. W. Vaughan. Adaptive task assignment for crowdsourced classification. In The 30th International Conference on Machine Learning (ICML), 2013.
  • Ho et al. [2015] Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, and Jennifer Wortman Vaughan. Incentivizing high quality crowdwork. In Proceedings of the 24th International Conference on World Wide Web (WWW), 2015.
  • Horton and Chilton [2010] John Joseph Horton and Lydia B. Chilton. The labor economics of paid crowdsourcing. In Proceedings of the 11th ACM conference on Electronic commerce (EC), 2010.
  • Jain et al. [2009] S. Jain, Y. Chen, and D.C. Parkes. Designing incentives for online question and answer forums. In Proceedings of the 10th ACM conference on Electronic commerce (EC), 2009.
  • Janis [1982] I.L. Janis. Groupthink: Psychological Studies of Policy Decisions and Fiascoes. Houghton Mifflin, 1982.
  • Jin and Ghahramani [2003] R. Jin and Z. Ghahramani. Learning with multiple labels. In Advances in Neural Information Processing Systems (NIPS), 2003.
  • Karger et al. [2011a] D. R. Karger, S. Oh, and D. Shah. Iterative learning for reliable crowdsourcing systems. In The 25th Annual Conference on Neural Information Processing Systems (NIPS), 2011a.
  • Karger et al. [2011b] D. R. Karger, S. Oh, and D. Shah. Budget-optimal crowdsourcing using low-rank matrix approximations. In Proc. 49th Annual Conference on Communication, Control, and Computing (Allerton), 2011b.
  • Khosla et al. [2011] Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. Novel dataset for fine-grained image categorization. In

    First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition

    , Colorado Springs, CO, June 2011.
  • Kittur et al. [2011] Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E. Kraut. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST), 2011.
  • Kulkarni et al. [2012] Anand Kulkarni, Matthew Can, and Björn Hartmann. Collaboratively crowdsourcing workflows with turkomatic. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work CSCW, 2012.
  • Lasry et al. [2008] Nathaniel Lasry, Eric Mazur, and Jessica Watkins. Peer instruction: From harvard to the two-year college. American Journal of Physics, 76(11):1066–1069, 2008.
  • Latané et al. [1979] Bibb Latané, Kipling Williams, and Stephen Harkins. Many hands make light the work: The causes and consequences of social loafing. Journal of Personality and Social Psychology, 37(6):822–832, 1979.
  • Law et al. [2016] Edith Law, Ming Yin, Joslin Goh, Kevin Chen, Michael A. Terry, and Krzysztof Z. Gajos. Curiosity killed the cat, but makes crowdwork better. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI), 2016.
  • Little et al. [2010] Greg Little, Lydia B. Chilton, Max Goldman, and Robert C. Miller. Turkit: Human computation algorithms on mechanical turk. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology (UIST), 2010.
  • Mason and Watts [2009] Winter Mason and Duncane Watts. Financial incentives and the “performance of crowds”. In Proceedings of the 1st Human Computation Workshop (HCOMP), 2009.
  • Mazur [2017] Eric Mazur. Peer instruction. In Peer Instruction, pages 9–19. Springer, 2017.
  • Niculae and Danescu-Niculescu-Mizil [2016] Vlad Niculae and Cristian Danescu-Niculescu-Mizil. Conversational markers of constructive discussions. In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2016.
  • Noronha et al. [2011] Jon Noronha, Eric Hysen, Haoqi Zhang, and Krzysztof Z. Gajos. Platemate: Crowdsourcing nutritional analysis from food photographs. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST), 2011.
  • Porter et al. [2011] Leo Porter, Cynthia Bailey Lee, Beth Simon, and Daniel Zingaro. Peer instruction: do students really learn from peer discussion in computing? In Proceedings of the seventh international workshop on Computing education research, 2011.
  • Raykar et al. [2010] V. Raykar, S. Yu, L. Zhao, G. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. Journal of Machine Learning Research, 11:1297–1322, 2010.
  • Retelny et al. [2014] Daniela Retelny, Sébastien Robaszkiewicz, Alexandra To, Walter S. Lasecki, Jay Patel, Negar Rahmati, Tulsee Doshi, Melissa Valentine, and Michael S. Bernstein. Expert crowdsourcing with flash teams. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (UIST), 2014.
  • [37] Jakob Rogstadius, Vassilis Kostakos, Aniket Kittur, Boris Smus, Jim Laredo, and Maja Vukovic. An assessment of intrinsic and extrinsic motivation on task performance in crowdsourcing markets. In 5th International AAAI Conference on Weblogs and Social Media (ICWSM).
  • Shah and Zhou [2015] Nihar Bhadresh Shah and Denny Zhou. Double or nothing: Multiplicative incentive mechanisms for crowdsourcing. In The 29th Annual Conference on Neural Information Processing Systems (NIPS), 2015.
  • Shaw et al. [2011] Aaron D. Shaw, John J. Horton, and Daniel L. Chen. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 conference on Computer supported cooperative work (CSCW), 2011.
  • Whitehill et al. [2009] J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems (NIPS), 2009.
  • Yin et al. [2013] Ming Yin, Yiling Chen, and Yu-An Sun. The effects of performance-contingent financial incentives in online labor markets. In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI), 2013.
  • Zhu et al. [2014] Haiyi Zhu, Steven P. Dow, Robert E. Kraut, and Aniket Kittur. Reviewing versus doing: Learning and performance in crowd assessment. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW), 2014.