Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online

04/28/2022
by   Dana Ruiter, et al.
0

Even though hate speech (HS) online has been an important object of research in the last decade, most HS-related corpora over-simplify the phenomenon of hate by attempting to label user comments as "hate" or "neutral". This ignores the complex and subjective nature of HS, which limits the real-life applicability of classifiers trained on these corpora. In this study, we present the M-Phasis corpus, a corpus of  9k German and French user comments collected from migration-related news articles. It goes beyond the "hate"-"neutral" dichotomy and is instead annotated with 23 features, which in combination become descriptors of various types of speech, ranging from critical comments to implicit and explicit expressions of hate. The annotations are performed by 4 native speakers per language and achieve high (0.77 <= k <= 1) inter-annotator agreements. Besides describing the corpus creation and presenting insights from a content, error and domain analysis, we explore its data characteristics by training several classification baselines.

READ FULL TEXT
research
05/26/2020

BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection

Toxic comments in online platforms are an unavoidable social issue under...
research
08/14/2020

Annotating for Hate Speech: The MaNeCo Corpus and Some Input from Critical Discourse Analysis

This paper presents a novel scheme for the annotation of hate speech in ...
research
10/02/2018

Who is Addressed in this Comment? Automatically Classifying Meta-Comments in News Comments

User comments have become an essential part of online journalism. Howeve...
research
12/13/2022

Lisan: Yemeni, Iraqi, Libyan, and Sudanese Arabic Dialect Copora with Morphological Annotations

This article presents morphologically-annotated Yemeni, Sudanese, Iraqi,...
research
08/10/2022

The Moral Foundations Reddit Corpus

Moral framing and sentiment can affect a variety of online and offline b...
research
04/11/2020

Classifying Constructive Comments

We introduce the Constructive Comments Corpus (C3), comprised of 12,000 ...
research
07/14/2023

Hybrid moderation in the newsroom: Recommending featured posts to content moderators

Online news outlets are grappling with the moderation of user-generated ...

Please sign up or login with your details

Forgot password? Click here to reset