Is preprocessing of text really worth your time for online comment classification?

06/07/2018
by   Fahim Mohammad, et al.
0

A large proportion of online comments present on public domains are constructive, however a significant proportion are toxic in nature. The comments contain lot of typos which increases the number of features manifold, making the ML model difficult to train. Considering the fact that the data scientists spend approximately 80 organizing their data [1], we explored how much effort should we invest in the preprocessing (transformation) of raw comments before feeding it to the state-of-the-art classification models. With the help of four models on Jigsaw toxic comment classification data, we demonstrated that the training of model without any transformation produce relatively decent model. Applying even basic transformations, in some cases, lead to worse performance and should be applied with caution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2018

Who is Addressed in this Comment? Automatically Classifying Meta-Comments in News Comments

User comments have become an essential part of online journalism. Howeve...
research
08/23/2022

Preprocessing Source Code Comments for Linguistic Models

Comments are an important part of the source code and are a primary sour...
research
09/21/2021

Not All Comments are Equal: Insights into Comment Moderation from a Topic-Aware Model

Moderation of reader comments is a significant problem for online news p...
research
10/02/2022

ReAct: A Review Comment Dataset for Actionability (and more)

Review comments play an important role in the evolution of documents. Fo...
research
04/18/2018

Forecasting the presence and intensity of hostility on Instagram using linguistic and social features

Online antisocial behavior, such as cyberbullying, harassment, and troll...
research
10/14/2020

Six Attributes of Unhealthy Conversation

We present a new dataset of approximately 44000 comments labeled by crow...
research
03/02/2018

CLX: Towards a scalable and comprehensible design of PBE data transformations

Effective data analytics on data collected from the real world usually b...

Please sign up or login with your details

Forgot password? Click here to reset