Trawling for Trolling: A Dataset

by   Hitkul, et al.

The ability to accurately detect and filter offensive content automatically is important to ensure a rich and diverse digital discourse. Trolling is a type of hurtful or offensive content that is prevalent in social media, but is underrepresented in datasets for offensive content detection. In this work, we present a dataset that models trolling as a subcategory of offensive content. The dataset was created by collecting samples from well-known datasets and reannotating them along precise definitions of different categories of offensive content. The dataset has 12,490 samples, split across 5 classes; Normal, Profanity, Trolling, Derogatory and Hate Speech. It encompasses content from Twitter, Reddit and Wikipedia Talk Pages. Models trained on our dataset show appreciable performance without any significant hyperparameter tuning and can potentially learn meaningful linguistic information effectively. We find that these models are sensitive to data ablation which suggests that the dataset is largely devoid of spurious statistical artefacts that could otherwise distract and confuse classification models.



There are no comments yet.


page 6

page 8


Predicting the Type and Target of Offensive Posts in Social Media

As offensive content has become pervasive in social media, there has bee...

DepressionNet: A Novel Summarization Boosted Deep Framework for Depression Detection on Social Media

Twitter is currently a popular online social media platform which allows...

Understanding and Detecting Dangerous Speech in Social Media

Social media communication has become a significant part of daily activi...

Attention-based method for categorizing different types of online harassment language

In the era of social media and networking platforms, Twitter has been do...

Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations

With the ever-increasing cases of hate spread on social media platforms,...

A Unified Deep Learning Architecture for Abuse Detection

Hate speech, offensive language, sexism, racism and other types of abusi...

Analyzing and learning the language for different types of harassment

The presence of a significant amount of harassment in user-generated con...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.