Modeling Profanity and Hate Speech in Social Media with Semantic Subspaces

06/14/2021
by   Vanessa Hahn, et al.
0

Hate speech and profanity detection suffer from data sparsity, especially for languages other than English, due to the subjective nature of the tasks and the resulting annotation incompatibility of existing corpora. In this study, we identify profane subspaces in word and sentence representations and explore their generalization capability on a variety of similar and distant target tasks in a zero-shot setting. This is done monolingually (German) and cross-lingually to closely-related (English), distantly-related (French) and non-related (Arabic) tasks. We observe that, on both similar and distant target tasks and across all languages, the subspace-based representations transfer more effectively than standard BERT representations in the zero-shot setting, with improvements between F1 +10.9 and F1 +42.9 over the baselines across all tested monolingual and cross-lingual scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2021

On the ability of monolingual models to learn language-agnostic representations

Pretrained multilingual models have become a de facto default approach f...
research
05/26/2020

English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too

Intermediate-task training has been shown to substantially improve pretr...
research
02/15/2021

Beyond the English Web: Zero-Shot Cross-Lingual and Lightweight Monolingual Classification of Registers

We explore cross-lingual transfer of register classification for web doc...
research
09/14/2021

Improving Zero-shot Cross-lingual Transfer between Closely Related Languages by injecting Character-level Noise

Cross-lingual transfer between a high-resource language and its dialects...
research
09/14/2021

Everything Is All It Takes: A Multipronged Strategy for Zero-Shot Cross-Lingual Information Extraction

Zero-shot cross-lingual information extraction (IE) describes the constr...
research
10/20/2021

Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction

We evaluate a simple approach to improving zero-shot multilingual transf...
research
04/20/2023

Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages

One of the challenges with finetuning pretrained language models (PLMs) ...

Please sign up or login with your details

Forgot password? Click here to reset