The Challenges of Persian User-generated Textual Content: A Machine Learning-Based Approach

01/20/2021
by   Mohammad Kasra Habib, et al.
0

Over recent years a lot of research papers and studies have been published on the development of effective approaches that benefit from a large amount of user-generated content and build intelligent predictive models on top of them. This research applies machine learning-based approaches to tackle the hurdles that come with Persian user-generated textual content. Unfortunately, there is still inadequate research in exploiting machine learning approaches to classify/cluster Persian text. Further, analyzing Persian text suffers from a lack of resources; specifically from datasets and text manipulation tools. Since the syntax and semantics of the Persian language is different from English and other languages, the available resources from these languages are not instantly usable for Persian. In addition, recognition of nouns and pronouns, parts of speech tagging, finding words' boundary, stemming or character manipulations for Persian language are still unsolved issues that require further studying. Therefore, efforts have been made in this research to address some of the challenges. This presented approach uses a machine-translated datasets to conduct sentiment analysis for the Persian language. Finally, the dataset has been rehearsed with different classifiers and feature engineering approaches. The results of the experiments have shown promising state-of-the-art performance in contrast to the previous efforts; the best classifier was Support Vector Machines which achieved a precision of 91.22

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2019

Sentiment Analysis of Czech Texts: An Algorithmic Survey

In the area of online communication, commerce and transactions, analyzin...
research
06/24/2019

Emotionally-Aware Chatbots: A Survey

Textual conversational agent or chatbots' development gather tremendous ...
research
06/03/2021

A Case Study of Spanish Text Transformations for Twitter Sentiment Analysis

Sentiment analysis is a text mining task that determines the polarity of...
research
03/03/2020

Benchmark Performance of Machine And Deep Learning Based Methodologies for Urdu Text Document Classification

In order to provide benchmark performance for Urdu text document classif...
research
04/29/2022

Handling and Presenting Harmful Text

Textual data can pose a risk of serious harm. These harms can be categor...
research
07/13/2022

A Transfer Learning Based Model for Text Readability Assessment in German

Text readability assessment has a wide range of applications for differe...
research
12/02/2021

Ownership and Creativity in Generative Models

Machine learning generated content such as image artworks, textual poems...

Please sign up or login with your details

Forgot password? Click here to reset