BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection

05/26/2020
by   Jihyung Moon, et al.
0

Toxic comments in online platforms are an unavoidable social issue under the cloak of anonymity. Hate speech detection has been actively done for languages such as English, German, or Italian, where manually labeled corpus has been released. In this work, we first present 9.4K manually labeled entertainment news comments for identifying Korean toxic speech, collected from a widely used online news platform in Korea. The comments are annotated regarding social bias and hate speech since both aspects are correlated. The inter-annotator agreement Krippendorff's alpha score is 0.492 and 0.496, respectively. We provide benchmarks using CharCNN, BiLSTM, and BERT, where BERT achieves the highest score on all tasks. The models generally display better performance on bias identification, since the hate speech detection is a more subjective issue. Additionally, when BERT is trained with bias label for hate speech detection, the prediction score increases, implying that bias and hate are intertwined. We make our dataset publicly available and open competitions with the corpus and benchmarks.

READ FULL TEXT
research
08/23/2022

K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment

Online Hate speech detection has become important with the growth of dig...
research
04/28/2022

Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online

Even though hate speech (HS) online has been an important object of rese...
research
04/06/2021

hBert + BiasCorp – Fighting Racism on the Web

Subtle and overt racism is still present both in physical and online com...
research
01/24/2023

ViHOS: Hate Speech Spans Detection for Vietnamese

The rise in hateful and offensive language directed at other users is on...
research
02/25/2022

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets

Detecting toxic or pejorative expressions in online communities has beco...
research
04/11/2020

Classifying Constructive Comments

We introduce the Constructive Comments Corpus (C3), comprised of 12,000 ...
research
08/09/2022

Exploring Hate Speech Detection with HateXplain and BERT

Hate Speech takes many forms to target communities with derogatory comme...

Please sign up or login with your details

Forgot password? Click here to reset