CReHate: Cross-cultural Re-annotation of English Hate Speech Dataset

08/31/2023
by   Nayeon Lee, et al.
0

English datasets predominantly reflect the perspectives of certain nationalities, which can lead to cultural biases in models and datasets. This is particularly problematic in tasks heavily influenced by subjectivity, such as hate speech detection. To delve into how individuals from different countries perceive hate speech, we introduce CReHate, a cross-cultural re-annotation of the sampled SBIC dataset. This dataset includes annotations from five distinct countries: Australia, Singapore, South Africa, the United Kingdom, and the United States. Our thorough statistical analysis highlights significant differences based on nationality, with only 59.4 achieving consensus among all countries. We also introduce a culturally sensitive hate speech classifier via transfer learning, adept at capturing perspectives of different nationalities. These findings underscore the need to re-evaluate certain aspects of NLP research, especially with regard to the nuanced nature of hate speech in the English language.

READ FULL TEXT

page 1

page 5

page 6

research
03/31/2023

Cross-Cultural Transfer Learning for Chinese Offensive Language Detection

Detecting offensive language is a challenging task. Generalizing across ...
research
06/02/2023

NLPositionality: Characterizing Design Biases of Datasets and Models

Design biases in NLP systems, such as performance differences for differ...
research
03/28/2022

EnCBP: A New Benchmark Dataset for Finer-Grained Cultural Background Prediction in English

While cultural backgrounds have been shown to affect linguistic expressi...
research
07/03/2017

The Fall of the Empire: The Americanization of English

As global political preeminence gradually shifted from the United Kingdo...
research
06/30/2022

Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions

Offensive Content Warning: This paper contains offensive language only f...
research
04/07/2022

Korean Online Hate Speech Dataset for Multilabel Classification: How Can Social Science Improve Dataset on Hate Speech?

We suggest a multilabel Korean online hate speech dataset that covers se...
research
08/02/2021

Cross-cultural Mood Perception in Pop Songs and its Alignment with Mood Detection Algorithms

Do people from different cultural backgrounds perceive the mood in music...

Please sign up or login with your details

Forgot password? Click here to reset