Antisemitic Messages? A Guide to High-Quality Annotation and a Labeled Dataset of Tweets

04/28/2023
by   Gunther Jikeli, et al.
0

One of the major challenges in automatic hate speech detection is the lack of datasets that cover a wide range of biased and unbiased messages and that are consistently labeled. We propose a labeling procedure that addresses some of the common weaknesses of labeled datasets. We focus on antisemitic speech on Twitter and create a labeled dataset of 6,941 tweets that cover a wide range of topics common in conversations about Jews, Israel, and antisemitism between January 2019 and December 2021 by drawing from representative samples with relevant keywords. Our annotation process aims to strictly apply a commonly used definition of antisemitism by forcing annotators to specify which part of the definition applies, and by giving them the option to personally disagree with the definition on a case-by-case basis. Labeling tweets that call out antisemitism, report antisemitism, or are otherwise related to antisemitism (such as the Holocaust) but are not actually antisemitic can help reduce false positives in automated detection. The dataset includes 1,250 tweets (18 are antisemitic according to the International Holocaust Remembrance Alliance (IHRA) definition of antisemitism. It is important to note, however, that the dataset is not comprehensive. Many topics are still not covered, and it only includes tweets collected from Twitter between January 2019 and December 2021. Additionally, the dataset only includes tweets that were written in English. Despite these limitations, we hope that this is a meaningful contribution to improving the automated detection of antisemitic speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/09/2019

Analysis of the Ethiopic Twitter Dataset for Abusive Speech in Amharic

In this paper, we present an analysis of the first Ethiopic Twitter Data...
research
09/29/2019

Annotating Antisemitic Online Content. Towards an Applicable Definition of Antisemitism

Online antisemitism is hard to quantify. How can it be measured in rapid...
research
05/01/2020

Will-They-Won't-They: A Very Large Dataset for Stance Detection on Twitter

We present a new challenging stance detection dataset, called Will-They-...
research
06/06/2018

Open Domain Suggestion Mining: Problem Definition and Datasets

We propose a formal definition for the task of suggestion mining in the ...
research
06/17/2021

An Information Retrieval Approach to Building Datasets for Hate Speech Detection

Building a benchmark dataset for hate speech detection presents several ...
research
04/04/2022

Clues in Tweets: Twitter-Guided Discovery and Analysis of SMS Spam

With its critical role in business and service delivery through mobile d...
research
10/25/2022

PolyHope: Two-Level Hope Speech Detection from Tweets

Hope is characterized as openness of spirit toward the future, a desire,...

Please sign up or login with your details

Forgot password? Click here to reset