Sentiment Analysis on Brazilian Portuguese User Reviews

12/10/2021
by   Frederico Souza, et al.
0

Sentiment Analysis is one of the most classical and primarily studied natural language processing tasks. This problem had a notable advance with the proposition of more complex and scalable machine learning models. Despite this progress, the Brazilian Portuguese language still disposes only of limited linguistic resources, such as datasets dedicated to sentiment classification, especially when considering the existence of predefined partitions in training, testing, and validation sets that would allow a more fair comparison of different algorithm alternatives. Motivated by these issues, this work analyzes the predictive performance of a range of document embedding strategies, assuming the polarity as the system outcome. This analysis includes five sentiment analysis datasets in Brazilian Portuguese, unified in a single dataset, and a reference partitioning in training, testing, and validation sets, both made publicly available through a digital repository. A cross-evaluation of dataset-specific models over different contexts is conducted to evaluate their generalization capabilities and the feasibility of adopting a unique model for addressing all scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2022

An LSTM model for Twitter Sentiment Analysis

Sentiment analysis on social media such as Twitter provides organization...
research
08/05/2021

Bambara Language Dataset for Sentiment Analysis

For easier communication, posting, or commenting on each others posts, p...
research
11/25/2014

LABR: A Large Scale Arabic Sentiment Analysis Benchmark

We introduce LABR, the largest sentiment analysis dataset to-date for th...
research
06/28/2021

Current Landscape of the Russian Sentiment Corpora

Currently, there are more than a dozen Russian-language corpora for sent...
research
05/31/2022

Uzbek Sentiment Analysis based on local Restaurant Reviews

Extracting useful information for sentiment analysis and classification ...
research
04/11/2020

Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network

Exponential growths of social media and micro-blogging sites not only pr...
research
01/15/2020

Overly Optimistic Prediction Results on Imbalanced Data: Flaws and Benefits of Applying Over-sampling

Information extracted from electrohysterography recordings could potenti...

Please sign up or login with your details

Forgot password? Click here to reset