Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

12/14/2021
by   Paul Röttger, et al.
0

Labelled data is the foundation of most natural language processing tasks. However, labelling data is difficult and there often are diverse valid beliefs about what the correct data labels should be. So far, dataset creators have acknowledged annotator subjectivity, but not actively managed it in the annotation process. This has led to partly-subjective datasets that fail to serve a clear downstream use. To address this issue, we propose two contrasting paradigms for data annotation. The descriptive paradigm encourages annotator subjectivity, whereas the prescriptive paradigm discourages it. Descriptive annotation allows for the surveying and modelling of different beliefs, whereas prescriptive annotation enables the training of models that consistently apply one belief. We discuss benefits and challenges in implementing both paradigms, and argue that dataset creators should explicitly aim for one or the other to facilitate the intended use of their dataset. Lastly, we design an annotation experiment to illustrate the contrast between the two paradigms.

READ FULL TEXT
research
09/26/2021

Paradigm Shift in Natural Language Processing

In the era of deep learning, modeling for most NLP tasks has converged t...
research
12/20/2022

Is GPT-3 a Good Data Annotator?

GPT-3 (Generative Pre-trained Transformer 3) is a large-scale autoregres...
research
02/17/2020

Handling Missing Annotations in Supervised Learning Data

Data annotation is an essential stage in supervised learning. However, t...
research
07/26/2023

A semantics-driven methodology for high-quality image annotation

Recent work in Machine Learning and Computer Vision has highlighted the ...
research
04/25/2023

Lessons Learned from a Citizen Science Project for Natural Language Processing

Many Natural Language Processing (NLP) systems use annotated corpora for...
research
05/24/2023

You Are What You Annotate: Towards Better Models through Annotator Representations

Annotator disagreement is ubiquitous in natural language processing (NLP...
research
05/28/2021

Changing the World by Changing the Data

NLP community is currently investing a lot more research and resources i...

Please sign up or login with your details

Forgot password? Click here to reset