Lessons Learned from a Citizen Science Project for Natural Language Processing

04/25/2023
by   Jan-Christoph Klie, et al.
0

Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often outsourced to paid crowdworkers. Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP. To investigate whether and how well Citizen Science can be applied in this setting, we conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset. Our results show that this can yield high-quality annotations and attract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues. We summarize lessons learned in the form of guidelines and provide our code and data to aid future work on Citizen Science.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2021

On the Ethical Limits of Natural Language Processing on Legal Text

Natural language processing (NLP) methods for analyzing legal text offer...
research
09/13/2022

The Role of Explanatory Value in Natural Language Processing

A key aim of science is explanation, yet the idea of explaining language...
research
12/13/2018

Towards a General-Purpose Linguistic Annotation Backend

Language documentation is inherently a time-intensive process; transcrip...
research
06/14/2023

Operationalising Representation in Natural Language Processing

Despite its centrality in the philosophy of cognitive science, there has...
research
05/24/2023

You Are What You Annotate: Towards Better Models through Annotator Representations

Annotator disagreement is ubiquitous in natural language processing (NLP...
research
11/08/2019

Generating relevant scenarios for intelligent transportation service

This paper addresses risk assessment issues while conceiving complex sys...
research
12/14/2021

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

Labelled data is the foundation of most natural language processing task...

Please sign up or login with your details

Forgot password? Click here to reset