Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq

by   Qiang Ning, et al.

High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce Crowdaq, an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and saved pipelines in a re-usable format. We show that Crowdaq simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community.


page 5

page 16


Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning

A growing body of work shows that many problems in fairness, accountabil...

Task allocation interface design and personalization in gamified participatory sensing for tourism

The collection of spatiotemporal tourism information is important in sma...

Simulation Framework for Realistic Large-scale Individual-level Health Data Generation

We propose a general framework for realistic data generation and simulat...

Aligning Intraobserver Agreement by Transitivity

Annotation reproducibility and accuracy rely on good consistency within ...

ContentWise Impressions: An Industrial Dataset with Impressions Included

In this article, we introduce the ContentWise Impressions dataset, a col...

Detecting discriminatory risk through data annotation based on Bayesian inferences

Thanks to the increasing growth of computational power and data availabi...

Challenges in Annotation of useR Data for UbiquitOUs Systems: Results from the 1st ARDUOUS Workshop

Labelling user data is a central part of the design and evaluation of pe...