Annotation Curricula to Implicitly Train Non-Expert Annotators

06/04/2021
by   Ji-Ung Lee, et al.
0

Annotation studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain. This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations; especially in citizen science or crowd sourcing scenarios where domain expertise is not required and only annotation guidelines are provided. To alleviate these issues, we propose annotation curricula, a novel approach to implicitly train annotators. Our goal is to gradually introduce annotators into the task by ordering instances that are annotated according to a learning curriculum. To do so, we first formalize annotation curricula for sentence- and paragraph-level annotation tasks, define an ordering strategy, and identify well-performing heuristics and interactively trained models on three existing English datasets. We then conduct a user study with 40 voluntary participants who are asked to identify the most fitting misconception for English tweets about the Covid-19 pandemic. Our results show that using a simple heuristic to order instances can already significantly reduce the total annotation time while preserving a high annotation quality. Annotation curricula thus can provide a novel way to improve data collection. To facilitate future research, we further share our code and data consisting of 2,400 annotations.

READ FULL TEXT
research
10/13/2022

ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Large-scale, high-quality corpora are critical for advancing research in...
research
08/16/2022

TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

Collecting and annotating task-oriented dialog data is difficult, especi...
research
07/27/2021

A Biomedically oriented automatically annotated Twitter COVID-19 Dataset

The use of social media data, like Twitter, for biomedical research has ...
research
05/27/2021

Investigating label suggestions for opinion mining in German Covid-19 social media

This work investigates the use of interactively updated label suggestion...
research
12/14/2019

#MeTooMA: Multi-Aspect Annotations of Tweets Related to the MeToo Movement

In this paper, we present a dataset containing 9,973 tweets related to t...
research
06/24/2023

Can GPT-4 Support Analysis of Textual Data in Tasks Requiring Highly Specialized Domain Expertise?

We evaluated the capability of generative pre-trained transformers (GPT-...
research
05/19/2019

Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction

Modern NLP systems require high-quality annotated data. In specialized d...

Please sign up or login with your details

Forgot password? Click here to reset