Empirical Study of Text Augmentation on Social Media Text in Vietnamese

09/25/2020
by   Son T. Luu, et al.
0

In the text classification problem, the imbalance of labels in datasets affect the performance of the text-classification models. Practically, the data about user comments on social networking sites not altogether appeared - the administrators often only allow positive comments and hide negative comments. Thus, when collecting the data about user comments on the social network, the data is usually skewed about one label, which leads the dataset to become imbalanced and deteriorate the model's ability. The data augmentation techniques are applied to solve the imbalance problem between classes of the dataset, increasing the prediction model's accuracy. In this paper, we performed augmentation techniques on the VLSP2019 Hate Speech Detection on Vietnamese social texts and the UIT - VSFC: Vietnamese Students' Feedback Corpus for Sentiment Analysis. The result of augmentation increases by about 1.5

READ FULL TEXT
research
07/06/2021

Identifying negativity factors from social media text corpus using sentiment analysis method

Automatic sentiment analysis play vital role in decision making. Many or...
research
06/21/2022

muBoost: An Effective Method for Solving Indic Multilingual Text Classification Problem

Text Classification is an integral part of many Natural Language Process...
research
05/05/2020

Creating a Multimodal Dataset of Images and Text to Study Abusive Language

In order to study online hate speech, the availability of datasets conta...
research
12/26/2019

Text Classification for Azerbaijani Language Using Machine Learning and Embedding

Text classification systems will help to solve the text clustering probl...
research
09/16/2019

Uncovering Flaming Events on News Media in Social Media

Social networking sites (SNSs) facilitate the sharing of ideas and infor...
research
06/12/2021

Study of sampling methods in sentiment analysis of imbalanced data

This work investigates the application of sampling methods for sentiment...
research
12/28/2022

Data Augmentation using Transformers and Similarity Measures for Improving Arabic Text Classification

Learning models are highly dependent on data to work effectively, and th...

Please sign up or login with your details

Forgot password? Click here to reset