Automatic Sexism Detection with Multilingual Transformer Models

06/09/2021
by   Schütz Mina, et al.
0

Sexism has become an increasingly major problem on social networks during the last years. The first shared task on sEXism Identification in Social neTworks (EXIST) at IberLEF 2021 is an international competition in the field of Natural Language Processing (NLP) with the aim to automatically identify sexism in social media content by applying machine learning methods. Thereby sexism detection is formulated as a coarse (binary) classification problem and a fine-grained classification task that distinguishes multiple types of sexist content (e.g., dominance, stereotyping, and objectification). This paper presents the contribution of the AIT_FHSTP team at the EXIST2021 benchmark for both tasks. To solve the tasks we applied two multilingual transformer models, one based on multilingual BERT and one based on XLM-R. Our approach uses two different strategies to adapt the transformers to the detection of sexist content: first, unsupervised pre-training with additional data and second, supervised fine-tuning with additional and augmented data. For both tasks our best model is XLM-R with unsupervised pre-training on the EXIST data and additional datasets and fine-tuning on the provided dataset. The best run for the binary classification (task 1) achieves a macro F1-score of 0.7752 and scores 5th rank in the benchmark; for the multiclass classification (task 2) our best submission scores 6th rank with a macro F1-score of 0.5589.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2023

Attention at SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS)

In this paper, we have worked on interpretability, trust, and understand...
research
01/09/2021

Task Adaptive Pretraining of Transformers for Hostility Detection

Identifying adverse and hostile content on the web and more particularly...
research
01/08/2021

Leveraging Multilingual Transformers for Hate Speech Detection

Detecting and classifying instances of hate in social media text has bee...
research
01/11/2022

The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild

This paper presents a new training dataset for automatic genre identific...
research
03/27/2023

Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand Safety

The rapid growth in user generated content on social media has resulted ...
research
11/08/2021

Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

The popularity of social media has created problems such as hate speech ...
research
12/27/2022

Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage

Content moderation is the process of screening and monitoring user-gener...

Please sign up or login with your details

Forgot password? Click here to reset