Creating a Multimodal Dataset of Images and Text to Study Abusive Language

05/05/2020
by   Alessio Palmero Aprosio, et al.
0

In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restrictions. Furthermore, while text-only datasets of this kind have been widely used, limitations set by image-based social media platforms like Instagram make it difficult for researchers to experiment with multimodal hate speech data. We therefore developed CREENDER, an annotation tool that has been used in school classes to create a multimodal dataset of images and abusive comments, which we make freely available under Apache 2.0 license. The corpus, with Italian comments, has been analysed from different perspectives, to investigate whether the subject of the images plays a role in triggering a comment. We find that users judge the same images in different ways, although the presence of a person in the picture increases the probability to get an offensive comment.

READ FULL TEXT

page 2

page 5

research
07/01/2023

Image Matters: A New Dataset and Empirical Study for Multimodal Hyperbole Detection

Hyperbole, or exaggeration, is a common linguistic phenomenon. The detec...
research
09/25/2020

Empirical Study of Text Augmentation on Social Media Text in Vietnamese

In the text classification problem, the imbalance of labels in datasets ...
research
06/01/2022

BD-SHS: A Benchmark Dataset for Learning to Detect Online Bangla Hate Speech in Different Social Contexts

Social media platforms and online streaming services have spawned a new ...
research
01/24/2023

ViHOS: Hate Speech Spans Detection for Vietnamese

The rise in hateful and offensive language directed at other users is on...
research
03/27/2023

IRFL: Image Recognition of Figurative Language

Figures of speech such as metaphors, similes, and idioms allow language ...
research
05/28/2021

Online Hate: Behavioural Dynamics and Relationship with Misinformation

Online debates are often characterised by extreme polarisation and heate...
research
06/11/2020

ETHOS: an Online Hate Speech Detection Dataset

Online hate speech is a newborn problem in our modern society which is g...

Please sign up or login with your details

Forgot password? Click here to reset