Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

The popularity of social media has created problems such as hate speech and sexism. The identification and classification of sexism in social media are very relevant tasks, as they would allow building a healthier social environment. Nevertheless, these tasks are considerably challenging. This work proposes a system to use multilingual and monolingual BERT and data points translation and ensemble strategies for sexism identification and classification in English and Spanish. It was conducted in the context of the sEXism Identification in Social neTworks shared 2021 (EXIST 2021) task, proposed by the Iberian Languages Evaluation Forum (IberLEF). The proposed system and its main components are described, and an in-depth hyperparameters analysis is conducted. The main results observed were: (i) the system obtained better results than the baseline model (multilingual BERT); (ii) ensemble models obtained better results than monolingual models; and (iii) an ensemble model considering all individual models and the best standardized values obtained the best accuracies and F1-scores for both tasks. This work obtained first place in both tasks at EXIST, with the highest accuracies (0.780 for task 1 and 0.658 for task 2) and F1-scores (F1-binary of 0.780 for task 1 and F1-macro of 0.579 for task 2).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2020

FinEst BERT and CroSloEngual BERT: less is more in multilingual models

Large pretrained masked language models have become state-of-the-art sol...
research
06/09/2021

Automatic Sexism Detection with Multilingual Transformer Models

Sexism has become an increasingly major problem on social networks durin...
research
02/22/2021

RUBERT: A Bilingual Roman Urdu BERT Using Cross Lingual Transfer Learning

In recent studies, it has been shown that Multilingual language models u...
research
12/05/2022

Human-in-the-Loop Hate Speech Classification in a Multilingual Context

The shift of public debate to the digital sphere has been accompanied by...
research
11/26/2020

Towards Interpretable Multilingual Detection of Hate Speech against Immigrants and Women in Twitter at SemEval-2019 Task 5

his paper describes our techniques to detect hate speech against women a...
research
09/22/2022

AIR-JPMC@SMM4H'22: Classifying Self-Reported Intimate Partner Violence in Tweets with Multiple BERT-based Models

This paper presents our submission for the SMM4H 2022-Shared Task on the...

Please sign up or login with your details

Forgot password? Click here to reset