Sociocultural knowledge is needed for selection of shots in hate speech detection tasks

04/04/2023
by   Antonis Maronikolakis, et al.
4

We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for the countries of Brazil, Germany, India and Kenya, to aid training and interpretability of models. We demonstrate how our lexicon can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target words when making predictions. Further, we propose a method to aid shot selection for training in low-resource settings via HATELEXICON. In few-shot learning, the selection of shots is of paramount importance to model performance. In our work, we simulate a few-shot setting for German and Hindi, using HASOC data for training and the Multilingual HateCheck (MHC) as a benchmark. We show that selecting shots based on our lexicon leads to models performing better on MHC than models trained on shots sampled randomly. Thus, when given only a few training examples, using our lexicon to select shots containing more sociocultural information leads to better few-shot performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2022

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Repre...
research
09/17/2021

Self-training with Few-shot Rationalization: Teacher Explanations Aid Student in Few-shot NLU

While pre-trained language models have obtained state-of-the-art perform...
research
04/22/2020

Learning to Classify Intents and Slot Labels Given a Handful of Examples

Intent classification (IC) and slot filling (SF) are core components in ...
research
04/14/2020

Deep Learning Models for Multilingual Hate Speech Detection

Hate speech detection is a challenging problem with most of the datasets...
research
03/05/2021

Multilingual Byte2Speech Text-To-Speech Models Are Few-shot Spoken Language Learners

We present a multilingual end-to-end Text-To-Speech framework that maps ...
research
05/25/2022

ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection

Hate speech detection is complex; it relies on commonsense reasoning, kn...
research
06/20/2023

Visually grounded few-shot word learning in low-resource settings

We propose a visually grounded speech model that learns new words and th...

Please sign up or login with your details

Forgot password? Click here to reset