Ground-Truth, Whose Truth? – Examining the Challenges with Annotating Toxic Text Datasets

12/07/2021
by   Kofi Arhin, et al.
5

The use of machine learning (ML)-based language models (LMs) to monitor content online is on the rise. For toxic text identification, task-specific fine-tuning of these models are performed using datasets labeled by annotators who provide ground-truth labels in an effort to distinguish between offensive and normal content. These projects have led to the development, improvement, and expansion of large datasets over time, and have contributed immensely to research on natural language. Despite the achievements, existing evidence suggests that ML models built on these datasets do not always result in desirable outcomes. Therefore, using a design science research (DSR) approach, this study examines selected toxic text datasets with the goal of shedding light on some of the inherent issues and contributing to discussions on navigating these challenges for existing and future projects. To achieve the goal of the study, we re-annotate samples from three toxic text datasets and find that a multi-label approach to annotating toxic text samples can help to improve dataset quality. While this approach may not improve the traditional metric of inter-annotator agreement, it may better capture dependence on context and diversity in annotators. We discuss the implications of these results for both theory and practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations

Explaining the decisions of neural models is crucial for ensuring their ...
research
09/03/2023

Representations Matter: Embedding Modes of Large Language Models using Dynamic Mode Decomposition

Existing large language models (LLMs) are known for generating "hallucin...
research
04/25/2019

PHANTOM: Curating GitHub for engineered software projects using time-series clustering

Context: Within the field of Mining Software Repositories, there are num...
research
09/30/2020

Uncertainty Estimation For Community Standards Violation In Online Social Networks

Online Social Networks (OSNs) provide a platform for users to share thei...
research
11/04/2022

The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

Human variation in labeling is often considered noise. Annotation projec...
research
02/07/2022

Jury Learning: Integrating Dissenting Voices into Machine Learning Models

Whose labels should a machine learning (ML) algorithm learn to emulate? ...
research
03/21/2022

Healthy Twitter discussions? Time will tell

Studying misinformation and how to deal with unhealthy behaviours within...

Please sign up or login with your details

Forgot password? Click here to reset