Improving Generalizability in Implicitly Abusive Language Detection with Concept Activation Vectors

04/05/2022
by   Isar Nejadgholi, et al.
0

Robustness of machine learning models on ever-changing real-world data is critical, especially for applications affecting human well-being such as content moderation. New kinds of abusive language continually emerge in online discussions in response to current events (e.g., COVID-19), and the deployed abuse detection systems should be updated regularly to remain accurate. In this paper, we show that general abusive language classifiers tend to be fairly reliable in detecting out-of-domain explicitly abusive utterances but fail to detect new types of more subtle, implicit abuse. Next, we propose an interpretability technique, based on the Testing Concept Activation Vector (TCAV) method from computer vision, to quantify the sensitivity of a trained model to the human-defined concepts of explicit and implicit abusive language, and use that to explain the generalizability of the model on new data, in this case, COVID-related anti-Asian hate speech. Extending this technique, we introduce a novel metric, Degree of Explicitness, for a single instance and show that the new metric is beneficial in suggesting out-of-domain unlabeled examples to effectively enrich the training data with informative, implicitly abusive texts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2022

Leveraging World Knowledge in Implicit Hate Speech Detection

While much attention has been paid to identifying explicit hate speech, ...
research
08/31/2022

Concept Gradient: Concept-based Interpretation Without Linear Assumption

Concept-based interpretations of black-box models are often more intuiti...
research
02/07/2022

PatClArC: Using Pattern Concept Activation Vectors for Noise-Robust Model Debugging

State-of-the-art machine learning models are commonly (pre-)trained on l...
research
11/17/2017

Vision Based Railway Track Monitoring using Deep Learning

Computer vision based methods have been explored in the past for detecti...
research
11/30/2017

TCAV: Relative concept importance testing with Linear Concept Activation Vectors

Neural networks commonly offer high utility but remain difficult to inte...
research
07/24/2022

Inter-model Interpretability: Self-supervised Models as a Case Study

Since early machine learning models, metrics such as accuracy and precis...
research
01/07/2021

Corner case data description and detection

As the major factors affecting the safety of deep learning models, corne...

Please sign up or login with your details

Forgot password? Click here to reset