Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

03/23/2022
by   Tulika Bose, et al.
0

Hate speech classifiers exhibit substantial performance degradation when evaluated on datasets different from the source. This is due to learning spurious correlations between words that are not necessarily relevant to hateful language, and hate speech labels from the training corpus. Previous work has attempted to mitigate this problem by regularizing specific terms from pre-defined static dictionaries. While this has been demonstrated to improve the generalizability of classifiers, the coverage of such methods is limited and the dictionaries require regular manual updates from human experts. In this paper, we propose to automatically identify and reduce spurious correlations using attribution methods with dynamic refinement of the list of terms that need to be regularized during training. Our approach is flexible and improves the cross-corpora performance over previous work independently and in combination with pre-defined dictionaries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2022

Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

State-of-the-art approaches for hate-speech detection usually exhibit po...
research
01/29/2017

Extracting Bilingual Persian Italian Lexicon from Comparable Corpora Using Different Types of Seed Dictionaries

Bilingual dictionaries are very important in various fields of natural l...
research
11/20/2019

Casting a Wide Net: Robust Extraction of Potentially Idiomatic Expressions

Idiomatic expressions like `out of the woods' and `up the ante' present ...
research
11/09/2018

Multilingual and Unsupervised Subword Modeling for Zero-Resource Languages

Unsupervised subword modeling aims to learn low-level representations of...
research
07/31/2023

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input in...
research
08/27/2021

From Pivots to Graphs: Augmented CycleDensity as a Generalization to One Time InverseConsultation

This paper describes an approach used to generate new translations using...

Please sign up or login with your details

Forgot password? Click here to reset