Mephisto: A Framework for Portable, Reproducible, and Iterative Crowdsourcing

01/12/2023
by   Jack Urbanek, et al.
0

We introduce Mephisto, a framework to make crowdsourcing for research more reproducible, transparent, and collaborative. Mephisto provides abstractions that cover a broad set of task designs and data collection workflows, and provides a simple user experience to make best-practices easy defaults. In this whitepaper we discuss the current state of data collection and annotation in ML research, establish the motivation for building a shared framework to enable researchers to create and open-source data collection and annotation tools as part of their publication, and outline a set of suggested requirements for a system to facilitate these goals. We then step through our resolution in Mephisto, explaining the abstractions we use, our design decisions around the user experience, and share implementation details and where they align with the original motivations. We also discuss current limitations, as well as future work towards continuing to deliver on the framework's initial goals. Mephisto is available as an open source project, and its documentation can be found at www.mephisto.ai.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2020

Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq

High-quality and large-scale data are key to success for AI systems. How...
research
12/15/2021

Crowdsourcing County-Level Data on Early COVID-19 Policy Interventions in the United States: Technical Report

Beginning in April 2020, we gathered partial county-level data on non-ph...
research
11/07/2016

Crowdsourcing in Computer Vision

Computer vision systems require large amounts of manually annotated data...
research
03/28/2022

Recruiting Software Engineers on Prolific

Recruiting participants for software engineering research has been a pri...
research
04/29/2019

TheFragebogen: A Web Browser-based Questionnaire Framework for Scientific Research

Quality of Experience (QoE) typically involves conducting experiments in...
research
08/03/2020

ContentWise Impressions: An Industrial Dataset with Impressions Included

In this article, we introduce the ContentWise Impressions dataset, a col...
research
10/07/2020

Kartta Labs: Collaborative Time Travel

We introduce the modular and scalable design of Kartta Labs, an open sou...

Please sign up or login with your details

Forgot password? Click here to reset