What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?

06/01/2021
by   Nikita Nangia, et al.
0

Crowdsourcing is widely used to create data for common natural language understanding tasks. Despite the importance of these datasets for measuring and refining model understanding of language, there has been little focus on the crowdsourcing methods used for collecting the datasets. In this paper, we compare the efficacy of interventions that have been proposed in prior work as ways of improving data quality. We use multiple-choice question answering as a testbed and run a randomized trial by assigning crowdworkers to write questions under one of four different data collection protocols. We find that asking workers to write explanations for their examples is an ineffective stand-alone strategy for boosting NLU example difficulty. However, we find that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert judgments is an effective means of collecting challenging data. But using crowdsourced, instead of expert judgments, to qualify workers and send feedback does not prove to be effective. We observe that the data from the iterative protocol with expert assessments is more challenging by several measures. Notably, the human–model gap on the unanimous agreement portion of this data is, on average, twice as large as the gap for the baseline protocol data.

READ FULL TEXT
research
04/15/2021

Does Putting a Linguist in the Loop Improve NLU Data Collection?

Many crowdsourced NLP datasets contain systematic gaps and biases that a...
research
10/12/2020

The Extraordinary Failure of Complement Coercion Crowdsourcing

Crowdsourcing has eased and scaled up the collection of linguistic annot...
research
02/14/2018

Crowd ideation of supervised learning problems

Crowdsourcing is an important avenue for collecting machine learning dat...
research
08/26/2019

Don't paraphrase, detect! Rapid and Effective Data Collection for Semantic Parsing

A major hurdle on the road to conversational interfaces is the difficult...
research
10/13/2020

Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options

Large-scale natural language inference (NLI) datasets such as SNLI or MN...
research
05/22/2023

ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness

The emergence of generative large language models (LLMs) raises the ques...
research
06/02/2021

On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study

In adversarial data collection (ADC), a human workforce interacts with a...

Please sign up or login with your details

Forgot password? Click here to reset