Cyberbullying Detection -- Technical Report 2/2018, Department of Computer Science AGH, University of Science and Technology

08/02/2018
by   Michał Ptaszyński, et al.
0

The research described in this paper concerns automatic cyberbullying detection in social media. There are two goals to achieve: building a gold standard cyberbullying detection dataset and measuring the performance of the Samurai cyberbullying detection system. The Formspring dataset provided in a Kaggle competition was re-annotated as a part of the research. The annotation procedure is described in detail and, unlike many other recent data annotation initiatives, does not use Mechanical Turk for finding people willing to perform the annotation. The new annotation compared to the old one seems to be more coherent since all tested cyberbullying detection system performed better on the former. The performance of the Samurai system is compared with 5 commercial systems and one well-known machine learning algorithm, used for classifying textual content, namely Fasttext. It turns out that Samurai scores the best in all measures (accuracy, precision and recall), while Fasttext is the second-best performing algorithm.

READ FULL TEXT
research
06/02/2022

The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia

Expression of sentiment in parliamentary debates is deemed to be signifi...
research
06/16/2023

The Use of Web Archives in Disinformation Research

In recent years, journalists and other researchers have used web archive...
research
12/09/2022

Comparative Study of Sentiment Analysis for Multi-Sourced Social Media Platforms

There is a vast amount of data generated every second due to the rapidly...
research
03/18/2021

Addressing Hate Speech with Data Science: An Overview from Computer Science Perspective

From a computer science perspective, addressing on-line hate speech is a...
research
08/23/2019

Toward Dialogue Modeling: A Semantic Annotation Scheme for Questions and Answers

The present study proposes an annotation scheme for classifying the cont...
research
06/08/2023

Closing the Loop: Testing ChatGPT to Generate Model Explanations to Improve Human Labelling of Sponsored Content on Social Media

Regulatory bodies worldwide are intensifying their efforts to ensure tra...
research
12/17/2020

Benchmarking Automatic Detection of Psycholinguistic Characteristics for Better Human-Computer Interaction

When two people pay attention to each other and are interested in what t...

Please sign up or login with your details

Forgot password? Click here to reset