Quantitative Stopword Generation for Sentiment Analysis via Recursive and Iterative Deletion

09/04/2022
by   Daniel M. DiPietro, et al.
0

Stopwords carry little semantic information and are often removed from text data to reduce dataset size and improve machine learning model performance. Consequently, researchers have sought to develop techniques for generating effective stopword sets. Previous approaches have ranged from qualitative techniques relying upon linguistic experts, to statistical approaches that extract word importance using correlations or frequency-dependent metrics computed on a corpus. We present a novel quantitative approach that employs iterative and recursive feature deletion algorithms to see which words can be deleted from a pre-trained transformer's vocabulary with the least degradation to its performance, specifically for the task of sentiment analysis. Empirically, stopword lists generated via this approach drastically reduce dataset size while negligibly impacting model performance, in one such example shrinking the corpus by 28.4 logistic regression model by 0.25 by 63.7 that our approach can generate highly effective stopword sets for specific NLP tasks.

READ FULL TEXT
research
04/15/2020

Sentiment Analysis of Yelp Reviews: A Comparison of Techniques and Models

We use over 350,000 Yelp reviews on 5,000 restaurants to perform an abla...
research
06/13/2020

Transferring Monolingual Model to Low-Resource Language: The Case of Tigrinya

In recent years, transformer models have achieved great success in natur...
research
07/02/2020

Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey

The purpose of the study is to investigate the relative effectiveness of...
research
07/04/2018

BCSAT : A Benchmark Corpus for Sentiment Analysis in Telugu Using Word-level Annotations

The presented work aims at generating a systematically annotated corpus ...
research
01/27/2018

Combining Convolution and Recursive Neural Networks for Sentiment Analysis

This paper addresses the problem of sentence-level sentiment analysis. I...
research
06/13/2019

Sentiment analysis is not solved! Assessing and probing sentiment classification

Neural methods for SA have led to quantitative improvements over previou...
research
08/29/2019

Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds

Incorporating Item Response Theory (IRT) into NLP tasks can provide valu...

Please sign up or login with your details

Forgot password? Click here to reset