Challenges and strategies for running controlled crowdsourcing experiments

by   Jorge Ramirez, et al.

This paper reports on the challenges and lessons we learned while running controlled experiments in crowdsourcing platforms. Crowdsourcing is becoming an attractive technique to engage a diverse and large pool of subjects in experimental research, allowing researchers to achieve levels of scale and completion times that would otherwise not be feasible in lab settings. However, the scale and flexibility comes at the cost of multiple and sometimes unknown sources of bias and confounding factors that arise from technical limitations of crowdsourcing platforms and from the challenges of running controlled experiments in the "wild". In this paper, we take our experience in running systematic evaluations of task design as a motivating example to explore, describe, and quantify the potential impact of running uncontrolled crowdsourcing experiments and derive possible coping strategies. Among the challenges identified, we can mention sampling bias, controlling the assignment of subjects to experimental conditions, learning effects, and reliability of crowdsourcing results. According to our empirical studies, the impact of potential biases and confounding factors can amount to a 38% loss in the utility of the data collected in uncontrolled settings; and it can significantly change the outcome of experiments. These issues ultimately inspired us to implement CrowdHub, a system that sits on top of major crowdsourcing platforms and allows researchers and practitioners to run controlled crowdsourcing projects.



page 1


CrowdHub: Extending crowdsourcing platforms for the controlled evaluation of tasks designs

We present CrowdHub, a tool for running systematic evaluations of task d...

On the state of reporting in crowdsourcing experiments and a checklist to aid current practices

Crowdsourcing is being increasingly adopted as a platform to run studies...

Can we trust online crowdworkers? Comparing online and offline participants in a preference test of virtual agents

Conducting user studies is a crucial component in many scientific fields...

Comparison-Based Framework for Psychophysics: Lab versus Crowdsourcing

Traditionally, psychophysical experiments are conducted by repeated meas...

Multiple-bias sensitivity analysis using bounds

Unmeasured confounding, selection bias, and measurement error are well-k...

Recruiting Software Engineers on Prolific

Recruiting participants for software engineering research has been a pri...

Investigating Crowdsourcing to Generate Distractors for Multiple-Choice Assessments

We present and analyze results from a pilot study that explores how crow...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.