Detecting Offensive Content in Open-domain Conversations using Two Stage Semi-supervision

11/30/2018
by   Chandra Khatri, et al.
0

As open-ended human-chatbot interaction becomes commonplace, sensitive content detection gains importance. In this work, we propose a two stage semi-supervised approach to bootstrap large-scale data for automatic sensitive language detection from publicly available web resources. We explore various data selection methods including 1) using a blacklist to rank online discussion forums by the level of their sensitiveness followed by randomly sampling utterances and 2) training a weakly supervised model in conjunction with the blacklist for scoring sentences from online discussion forums to curate a dataset. Our data collection strategy is flexible and allows the models to detect implicit sensitive content for which manual annotations may be difficult. We train models using publicly available annotated datasets as well as using the proposed large-scale semi-supervised datasets. We evaluate the performance of all the models on Twitter and Toxic Wikipedia comments testsets as well as on a manually annotated spoken language dataset collected during a large scale chatbot competition. Results show that a model trained on this collected data outperforms the baseline models by a large margin on both in-domain and out-of-domain testsets, achieving an F1 score of 95.5 out-of-domain testset compared to a score of 75 datasets. We also showcase that large scale two stage semi-supervision generalizes well across multiple classes of sensitivities such as hate speech, racism, sexual and pornographic content, etc. without even providing explicit labels for these classes, leading to an average recall of 95.5 models trained using annotated public datasets which achieve an average recall of 73.2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2017

Recognizing Explicit and Implicit Hate Speech Using a Weakly Supervised Two-path Bootstrapping Approach

In the wake of a polarizing election, social media is laden with hateful...
research
12/01/2022

SOLD: Sinhala Offensive Language Dataset

The widespread of offensive content online, such as hate speech and cybe...
research
04/28/2022

Pseudo strong labels for large scale weakly supervised audio tagging

Large-scale audio tagging datasets inevitably contain imperfect labels, ...
research
08/21/2023

BAN-PL: a Novel Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service

Advances in automated detection of offensive language online, including ...
research
07/02/2020

Semi-Supervised NMF-CNN For Sound Event Detection

For the DCASE 2020 Challenge Task 4, this paper pro-posed a combinative ...
research
12/07/2021

Reducing Target Group Bias in Hate Speech Detectors

The ubiquity of offensive and hateful content on online fora necessitate...
research
08/12/2021

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate

Detecting online hate is a complex task, and low-performing models have ...

Please sign up or login with your details

Forgot password? Click here to reset