On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

07/06/2017
by   Jose Camacho-Collados, et al.
0

In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a state-of-the-art text classifier based on convolutional neural networks. Despite potentially affecting the final performance of any given model, this aspect has not received a substantial interest in the deep learning literature. We perform an extensive evaluation in standard benchmarks from text categorization and sentiment analysis. Our results show that a simple tokenization of the input text is often enough, but also highlight the importance of being consistent in the preprocessing of the evaluation set and the corpus used for training word embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2019

Investigating the Effect of Segmentation Methods on Neural Model based Sentiment Analysis on Informal Short Texts in Turkish

This work investigates segmentation approaches for sentiment analysis on...
research
04/18/2020

A Hybrid Approach for Aspect-Based Sentiment Analysis Using Deep Contextual Word Embeddings and Hierarchical Attention

The Web has become the main platform where people express their opinions...
research
10/17/2017

RETUYT in TASS 2017: Sentiment Analysis for Spanish Tweets using SVM and CNN

This article presents classifiers based on SVM and Convolutional Neural ...
research
08/08/2018

Exploiting Effective Representations for Chinese Sentiment Analysis Using a Multi-Channel Convolutional Neural Network

Effective representation of a text is critical for various natural langu...
research
05/18/2018

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Aspect based sentiment analysis (ABSA) can provide more detailed informa...
research
02/05/2015

Text Understanding from Scratch

This article demontrates that we can apply deep learning to text underst...
research
10/06/2018

Text-based Sentiment Analysis and Music Emotion Recognition

Sentiment polarity of tweets, blog posts or product reviews has become h...

Please sign up or login with your details

Forgot password? Click here to reset