Constructive and Toxic Speech Detection for Open-domain Social Media Comments in Vietnamese

03/18/2021
by   Luan Thanh Nguyen, et al.
0

The rise of social media has led to the increasing of comments on online forums. However, there still exists some invalid comments which were not informative for users. Moreover, those comments are also quite toxic and harmful to people. In this paper, we create a dataset for classifying constructive and toxic speech detection, named UIT-ViCTSD (Vietnamese Constructive and Toxic Speech Detection dataset) with 10,000 human-annotated comments. For these tasks, we proposed a system for constructive and toxic speech detection with the state-of-the-art transfer learning model in Vietnamese NLP as PhoBERT. With this system, we achieved 78.59 F1-score for identifying constructive and toxic comments separately. Besides, to have an objective assessment for the dataset, we implement a variety of baseline models as traditional Machine Learning and Deep Neural Network-Based models. With the results, we can solve some problems on the online discussions and develop the framework for identifying constructiveness and toxicity Vietnamese social media comments automatically.

READ FULL TEXT
research
10/09/2020

Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis

Hate speech and toxic comments are a common concern of social media plat...
research
06/08/2021

Cyberbullying Detection Using Deep Neural Network from Social Media Comments in Bangla Language

Cyberbullying or Online harassment detection on social media for various...
research
06/01/2022

BD-SHS: A Benchmark Dataset for Learning to Detect Online Bangla Hate Speech in Different Social Contexts

Social media platforms and online streaming services have spawned a new ...
research
03/31/2022

Bangla hate speech detection on social media using attention-based recurrent neural network

Hate speech has spread more rapidly through the daily use of technology ...
research
01/24/2023

ViHOS: Hate Speech Spans Detection for Vietnamese

The rise in hateful and offensive language directed at other users is on...
research
04/11/2020

Classifying Constructive Comments

We introduce the Constructive Comments Corpus (C3), comprised of 12,000 ...
research
12/03/2021

HS-BAN: A Benchmark Dataset of Social Media Comments for Hate Speech Detection in Bangla

In this paper, we present HS-BAN, a binary class hate speech (HS) datase...

Please sign up or login with your details

Forgot password? Click here to reset