Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density

11/02/2021
by   Juuso Eronen, et al.
0

We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods in order to estimate dataset complexity, which in turn is used to comparatively estimate the potential performance of machine learning (ML) classifiers prior to any training. We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments iterations. This way we can optimize the resource-intensive training of ML models which is becoming a serious issue due to the increases in available dataset sizes and the ever rising popularity of models based on Deep Neural Networks (DNN). The problem of constantly increasing needs for more powerful computational resources is also affecting the environment due to alarmingly-growing amount of CO2 emissions caused by training of large-scale ML models. The research was conducted on multiple datasets, including popular datasets, such as Yelp business review dataset used for training typical sentiment analysis models, as well as more recent datasets trying to tackle the problem of cyberbullying, which, being a serious social problem, is also a much more sophisticated problem form the point of view of linguistic representation. We use cyberbullying datasets collected for multiple languages, namely English, Japanese and Polish. The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2022

Exploring the Potential of Feature Density in Estimating Machine Learning Classifier Performance with Application to Cyberbullying Detection

In this research. we analyze the potential of Feature Density (HD) as a ...
research
06/08/2023

Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML

Medical applications of machine learning (ML) have experienced a surge i...
research
06/12/2022

Science through Machine Learning: Quantification of Poststorm Thermospheric Cooling

Machine learning (ML) is often viewed as a black-box regression techniqu...
research
03/28/2023

An Experimental Study on Sentiment Classification of Moroccan dialect texts in the web

With the rapid growth of the use of social media websites, obtaining the...
research
03/26/2021

LS-CAT: A Large-Scale CUDA AutoTuning Dataset

The effectiveness of Machine Learning (ML) methods depend on access to l...

Please sign up or login with your details

Forgot password? Click here to reset