Developing a Multilingual Annotated Corpus of Misogyny and Aggression

03/16/2020
by   Shiladitya Bhattacharya, et al.
0

In this paper, we discuss the development of a multilingual annotated corpus of misogyny and aggression in Indian English, Hindi, and Indian Bangla as part of a project on studying and automatically identifying misogyny and communalism on social media (the ComMA Project). The dataset is collected from comments on YouTube videos and currently contains a total of over 20,000 comments. The comments are annotated at two levels - aggression (overtly aggressive, covertly aggressive, and non-aggressive) and misogyny (gendered and non-gendered). We describe the process of data collection, the tagset used for annotation, and issues and challenges faced during the process of annotation. Finally, we discuss the results of the baseline experiments conducted to develop a classifier for misogyny in the three languages.

READ FULL TEXT

page 7

page 8

research
11/19/2021

The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse

In this paper, we discuss the development of a multilingual dataset anno...
research
10/09/2020

Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis

Hate speech and toxic comments are a common concern of social media plat...
research
11/11/2020

Matching Theory and Data with Personal-ITY: What a Corpus of Italian YouTube Comments Reveals About Personality

As a contribution to personality detection in languages other than Engli...
research
01/24/2023

ViHOS: Hate Speech Spans Detection for Vietnamese

The rise in hateful and offensive language directed at other users is on...
research
09/01/2021

Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

The increased proliferation of abusive content on social media platforms...
research
06/10/2021

Ruddit: Norms of Offensiveness for English Reddit Comments

On social media platforms, hateful and offensive language negatively imp...
research
09/11/2019

Kashmir: A Computational Analysis of the Voice of Peace

The recent Pulwama terror attack (February 14, 2019, Pulwama, Kashmir) t...

Please sign up or login with your details

Forgot password? Click here to reset