An Information Retrieval Approach to Building Datasets for Hate Speech Detection

by   Md Mustafizur Rahman, et al.

Building a benchmark dataset for hate speech detection presents several challenges. Firstly, because hate speech is relatively rare – e.g., less than 3% of Twitter posts are hateful <cit.> – random sampling of tweets to annotate is inefficient in capturing hate speech. A common practice is to only annotate tweets containing known “hate words”, but this risks yielding a biased benchmark that only partially captures the real-world phenomenon of interest. A second challenge is that definitions of hate speech tend to be highly variable and subjective. Annotators having diverse prior notions of hate speech may not only disagree with one another but also struggle to conform to specified labeling guidelines. Our key insight is that the rarity and subjectivity of hate speech are akin to that of relevance in information retrieval (IR). This connection suggests that well-established methodologies for creating IR test collections might also be usefully applied to create better benchmark datasets for hate speech detection. Firstly, to intelligently and efficiently select which tweets to annotate, we apply established IR techniques of pooling and active learning. Secondly, to improve both consistency and value of annotations, we apply task decomposition <cit.> and annotator rationale <cit.> techniques. Using the above techniques, we create and share a new benchmark dataset[We will release the dataset upon publication.] for hate speech detection with broader coverage than prior datasets. We also show a dramatic drop in accuracy of existing detection models when tested on these broader forms of hate. Collected annotator rationales not only provide documented support for labeling decisions but also create exciting future work opportunities for dual-supervision and/or explanation generation in modeling.


Emojis as Anchors to Detect Arabic Offensive Language and Hate Speech

We introduce a generic, language-independent method to collect a large p...

Information Retrieval in African Languages

Developing Information Retrieval (IR) tools and techniques in African la...

Antisemitic Messages? A Guide to High-Quality Annotation and a Labeled Dataset of Tweets

One of the major challenges in automatic hate speech detection is the la...

EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

This article introduces a new language-independent approach for creating...

Speech-to-Speech Translation For A Real-world Unwritten Language

We study speech-to-speech translation (S2ST) that translates speech from...

PolyHope: Two-Level Hope Speech Detection from Tweets

Hope is characterized as openness of spirit toward the future, a desire,...

On the Challenges of Building Datasets for Hate Speech Detection

Detection of hate speech has been formulated as a standalone application...

Please sign up or login with your details

Forgot password? Click here to reset