An Empirical Evaluation of Text Representation Schemes on Multilingual Social Web to Filter the Textual Aggression

04/16/2019
by   Sandip Modha, et al.
0

This paper attempt to study the effectiveness of text representation schemes on two tasks namely: User Aggression and Fact Detection from the social media contents. In User Aggression detection, The aim is to identify the level of aggression from the contents generated in the Social media and written in the English, Devanagari Hindi and Romanized Hindi. Aggression levels are categorized into three predefined classes namely: `Non-aggressive`, `Overtly Aggressive`, and `Covertly Aggressive`. During the disaster-related incident, Social media like, Twitter is flooded with millions of posts. In such emergency situations, identification of factual posts is important for organizations involved in the relief operation. We anticipated this problem as a combination of classification and Ranking problem. This paper presents a comparison of various text representation scheme based on BoW techniques, distributed word/sentence representation, transfer learning on classifiers. Weighted F_1 score is used as a primary evaluation metric. Results show that text representation using BoW performs better than word embedding on machine learning classifiers. While pre-trained Word embedding techniques perform better on classifiers based on deep neural net. Recent transfer learning model like ELMO, ULMFiT are fine-tuned for the Aggression classification task. However, results are not at par with pre-trained word embedding model. Overall, word embedding using fastText produce best weighted F_1-score than Word2Vec and Glove. Results are further improved using pre-trained vector model. Statistical significance tests are employed to ensure the significance of the classification results. In the case of lexically different test Dataset, other than training Dataset, deep neural models are more robust and perform substantially better than machine learning classifiers.

READ FULL TEXT

page 15

page 16

research
08/27/2018

Which Emoji Talks Best for My Picture?

Emojis have evolved as complementary sources for expressing emotion in s...
research
01/14/2021

Hostility Detection in Hindi leveraging Pre-Trained Language Models

Hostile content on social platforms is ever increasing. This has led to ...
research
01/11/2021

Evaluation of Deep Learning Models for Hostility Detection in Hindi Text

The social media platform is a convenient medium to express personal tho...
research
06/14/2023

Towards Automatic Identification of Violation Symptoms of Architecture Erosion

Architecture erosion has a detrimental effect on maintenance and evoluti...
research
02/08/2021

A study of text representations in Hate Speech Detection

The pervasiveness of the Internet and social media have enabled the rapi...
research
05/23/2022

Towards automatic detection of wildlife trade using machine vision models

Unsustainable trade in wildlife is one of the major threats affecting th...
research
01/13/2021

Coarse and Fine-Grained Hostility Detection in Hindi Posts using Fine Tuned Multilingual Embeddings

Due to the wide adoption of social media platforms like Facebook, Twitte...

Please sign up or login with your details

Forgot password? Click here to reset