Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity

10/25/2019
by   Chris Emmery, et al.
0

The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datasets, without a thorough evaluation of applicability. In this paper, we further illustrate these issues, as we (i) evaluate many publicly available resources for this task and demonstrate difficulties with data collection. These predominantly yield small datasets that fail to capture the required complex social dynamics and impede direct comparison of progress. We (ii) conduct an extensive set of experiments that indicate a general lack of cross-domain generalization of classifiers trained on these sources, and openly provide this framework to replicate and extend our evaluation criteria. Finally, we (iii) present an effective crowdsourcing method: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data. This largely circumvents the restrictions on data that can be collected, and increases classifier performance. We believe these contributions can aid in improving the empirical practices of future research in the field.

READ FULL TEXT
research
07/02/2021

Vox Populi, Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription

Domain-specific data is the crux of the successful transfer of machine l...
research
04/13/2020

A Survey of Single-Scene Video Anomaly Detection

This survey article summarizes research trends on the topic of anomaly d...
research
05/04/2023

Revisiting Table Detection Datasets for Visually Rich Documents

Table Detection has become a fundamental task for visually rich document...
research
11/04/2022

CochlScene: Acquisition of acoustic scene data using crowdsourcing

This paper describes a pipeline for collecting acoustic scene data by us...
research
11/20/2022

Deepfake Detection: A Comprehensive Study from the Reliability Perspective

The mushroomed Deepfake synthetic materials circulated on the internet h...
research
01/29/2022

Assessing Cross-dataset Generalization of Pedestrian Crossing Predictors

Pedestrian crossing prediction has been a topic of active research, resu...
research
08/20/2019

Towards Effective Device-Aware Federated Learning

With the wealth of information produced by social networks, smartphones,...

Please sign up or login with your details

Forgot password? Click here to reset