Is word segmentation necessary for Vietnamese sentiment classification?

01/01/2023
by   Duc-Vu Nguyen, et al.
0

To the best of our knowledge, this paper made the first attempt to answer whether word segmentation is necessary for Vietnamese sentiment classification. To do this, we presented five pre-trained monolingual S4- based language models for Vietnamese, including one model without word segmentation, and four models using RDRsegmenter, uitnlp, pyvi, or underthesea toolkits in the pre-processing data phase. According to comprehensive experimental results on two corpora, including the VLSP2016-SA corpus of technical article reviews from the news and social media and the UIT-VSFC corpus of the educational survey, we have two suggestions. Firstly, using traditional classifiers like Naive Bayes or Support Vector Machines, word segmentation maybe not be necessary for the Vietnamese sentiment classification corpus, which comes from the social domain. Secondly, word segmentation is necessary for Vietnamese sentiment classification when word segmentation is used before using the BPE method and feeding into the deep learning model. In this way, the RDRsegmenter is the stable toolkit for word segmentation among the uitnlp, pyvi, and underthesea toolkits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2019

State-of-the-Art Vietnamese Word Segmentation

Word segmentation is the first step of any tasks in Vietnamese language ...
research
10/02/2019

The merits of Universal Language Model Fine-tuning for Small Datasets – a case with Dutch book reviews

We evaluated the effectiveness of using language models, that were pre-t...
research
08/17/2016

SlangSD: Building and Using a Sentiment Dictionary of Slang Words for Short-Text Sentiment Classification

Sentiment in social media is increasingly considered as an important res...
research
11/05/2015

An Empirical Study on Sentiment Classification of Chinese Review using Word Embedding

In this article, how word embeddings can be used as features in Chinese ...
research
04/14/2020

Incorporating Uncertain Segmentation Information into Chinese NER for Social Media Text

Chinese word segmentation is necessary to provide word-level information...
research
07/10/2018

Paired Comparison Sentiment Scores

The method of paired comparisons is an established method in psychology....
research
04/21/2020

Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus

This paper focuses on how to extract opinions over each Persian sentence...

Please sign up or login with your details

Forgot password? Click here to reset