HateCheck: Functional Tests for Hate Speech Detection Models

12/31/2020
by   Paul Röttger, et al.
0

Detecting online hate is a difficult task that even state-of-the-art models struggle with. In previous research, hate speech detection models are typically evaluated by measuring their performance on held-out test data using metrics such as accuracy and F1 score. However, this approach makes it difficult to identify specific model weak points. It also risks overestimating generalisable model quality due to increasingly well-evidenced systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, we introduce HateCheck, a first suite of functional tests for hate speech detection models. We specify 29 model functionalities, the selection of which we motivate by reviewing previous research and through a series of interviews with civil society stakeholders. We craft test cases for each functionality and validate data quality through a structured annotation process. To illustrate HateCheck's utility, we test near-state-of-the-art transformer detection models as well as a popular commercial model, revealing critical model weaknesses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2022

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

Hate speech detection models are typically evaluated on held-out test se...
research
04/30/2022

HateCheckHIn: Evaluating Hindi Hate Speech Detection Models

Due to the sheer volume of online hate, the AI and NLP communities have ...
research
04/08/2022

Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection

Behavioural testing – verifying system capabilities by validating human-...
research
05/22/2023

Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

Hate speech is a severe issue that affects many online platforms. So far...
research
05/08/2020

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Although measuring held-out accuracy has been the primary approach to ev...
research
02/18/2023

Practical Flaky Test Prediction using Common Code Evolution and Test History Data

Non-deterministically behaving test cases cause developers to lose trust...
research
08/12/2021

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate

Detecting online hate is a complex task, and low-performing models have ...

Please sign up or login with your details

Forgot password? Click here to reset