A Weakly Supervised Classifier and Dataset of White Supremacist Language

06/27/2023
by   Michael Miller Yoder, et al.
0

We present a dataset and classifier for detecting the language of white supremacist extremism, a growing issue in online hate speech. Our weakly supervised classifier is trained on large datasets of text from explicitly white supremacist domains paired with neutral and anti-racist data from similar domains. We demonstrate that this approach improves generalization performance to new domains. Incorporating anti-racist texts as counterexamples to white supremacist language mitigates bias.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2017

Recognizing Explicit and Implicit Hate Speech Using a Weakly Supervised Two-path Bootstrapping Approach

In the wake of a polarizing election, social media is laden with hateful...
research
07/31/2020

Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studies

Detecting clinically relevant objects in medical images is a challenge d...
research
03/24/2022

Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation

End-to-end speech-to-speech translation (S2ST) without relying on interm...
research
03/20/2017

Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval

This paper contributes a new large-scale dataset for weakly supervised c...
research
06/13/2022

INDIGO: Intrinsic Multimodality for Domain Generalization

For models to generalize under unseen domains (a.k.a domain generalizati...
research
07/28/2017

A Weakly Supervised Approach to Train Temporal Relation Classifiers and Acquire Regular Event Pairs Simultaneously

Capabilities of detecting temporal relations between two events can bene...

Please sign up or login with your details

Forgot password? Click here to reset