Stop Words for Processing Software Engineering Documents: Do they Matter?

03/18/2023
by   Yaohou Fan, et al.
0

Stop words, which are considered non-predictive, are often eliminated in natural language processing tasks. However, the definition of uninformative vocabulary is vague, so most algorithms use general knowledge-based stop lists to remove stop words. There is an ongoing debate among academics about the usefulness of stop word elimination, especially in domain-specific settings. In this work, we investigate the usefulness of stop word removal in a software engineering context. To do this, we replicate and experiment with three software engineering research tools from related work. Additionally, we construct a corpus of software engineering domain-related text from 10,000 Stack Overflow questions and identify 200 domain-specific stop words using traditional information-theoretic methods. Our results show that the use of domain-specific stop words significantly improved the performance of research tools compared to the use of a general stop list and that 17 out of 19 evaluation measures showed better performance.

READ FULL TEXT

page 6

page 7

research
09/10/2021

On the validity of pre-trained transformers for natural language processing in the software engineering domain

Transformers are the current state-of-the-art of natural language proces...
research
11/18/2020

Accelerating Text Mining Using Domain-Specific Stop Word Lists

Text preprocessing is an essential step in text mining. Removing words t...
research
05/01/2023

Addressing Age-Related Accessibility Needs of Senior Users Through Model-Driven Engineering

One of the main reasons that cause seniors to face accessibility barrier...
research
04/07/2022

The General Index of Software Engineering Papers

We introduce the General Index of Software Engineering Papers, a dataset...
research
03/27/2017

Bootstrapping a Lexicon for Emotional Arousal in Software Engineering

Emotional arousal increases activation and performance but may also lead...
research
04/17/2023

Unleashing the Power of Sound: Revisiting the Physics of Notations for Modelling with auditory symbols

Sound - the oft-neglected sense for Software Engineering - is a crucial ...
research
08/04/2017

DoKnowMe: Towards a Domain Knowledge-driven Methodology for Performance Evaluation

Software engineering considers performance evaluation to be one of the k...

Please sign up or login with your details

Forgot password? Click here to reset