T5 for Hate Speech, Augmented Data and Ensemble

10/11/2022
by   Tosin Adewumi, et al.
0

We conduct relatively extensive investigations of automatic hate speech (HS) detection using different state-of-the-art (SoTA) baselines over 11 subtasks of 6 different datasets. Our motivation is to determine which of the recent SoTA models is best for automatic hate speech detection and what advantage methods like data augmentation and ensemble may have on the best model, if any. We carry out 6 cross-task investigations. We achieve new SoTA on two subtasks - macro F1 scores of 91.73 dataset, where previous SoTA are 51.52 near-SoTA on two others - macro F1 scores of 81.66 2019 dataset and 82.54 82.9 explainable artificial intelligence (XAI) algorithms (IG and SHAP) to reveal how two of the models (Bi-LSTM and T5) make the predictions they do by using examples. Other contributions of this work are 1) the introduction of a simple, novel mechanism for correcting out-of-class (OOC) predictions in T5, 2) a detailed description of the data augmentation methods, 3) the revelation of the poor data annotations in the HASOC 2021 dataset by using several examples and XAI (buttressing the need for better quality control), and 4) the public release of our model checkpoints and codes to foster transparency.

READ FULL TEXT
research
02/11/2022

HaT5: Hate Language Identification using Text-to-Text Transfer Transformer

We investigate the performance of a state-of-the art (SoTA) architecture...
research
05/31/2021

LIIR at SemEval-2021 task 6: Detection of Persuasion Techniques In Texts and Images using CLIP features

We describe our approach for SemEval-2021 task 6 on detection of persuas...
research
10/23/2022

EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation

We introduce EUREKA, an ensemble-based approach for performing automatic...
research
04/12/2022

Overlapping Word Removal is All You Need: Revisiting Data Imbalance in Hope Speech Detection

Hope Speech Detection, a task of recognizing positive expressions, has m...
research
02/21/2023

Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep Learning

Stuttering is a neuro-developmental speech impairment characterized by u...
research
05/15/2023

AdamR at SemEval-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning

The Explainable Detection of Online Sexism task presents the problem of ...

Please sign up or login with your details

Forgot password? Click here to reset