Natural Language or Not (NLoN) - A Package for Software Engineering Text Analysis Pipeline

03/20/2018
by   Mika V. Mäntylä, et al.
0

The use of natural language processing (NLP) is gaining popularity in software engineering. In order to correctly perform NLP, we must pre-process the textual information to separate natural language from other information, such as log messages, that are often part of the communication in software engineering. We present a simple approach for classifying whether some textual input is natural language or not. Although our NLoN package relies on only 11 language features and character tri-grams, we are able to achieve an area under the ROC curve performances between 0.976-0.987 on three different data sources, with Lasso regression from Glmnet as our learner and two human raters for providing ground truth. Cross-source prediction performance is lower and has more fluctuation with top ROC performances from 0.913 to 0.980. Compared with prior work, our approach offers similar performance but is considerably more lightweight, making it easier to apply in software engineering text mining pipelines. Our source code and data are provided as an R-package for further improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2021

CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing

Currently, a growing number of mature natural language processing applic...
research
10/20/2021

JavaBERT: Training a transformer-based model for the Java programming language

Code quality is and will be a crucial factor while developing new softwa...
research
02/05/2021

Understanding Emails and Drafting Responses – An Approach Using GPT-3

Providing computer systems with the ability to understand and generate n...
research
07/26/2019

Exploranative Code Quality Documents

Good code quality is a prerequisite for efficiently developing maintaina...
research
05/08/2009

The Modular Audio Recognition Framework (MARF) and its Applications: Scientific and Software Engineering Notes

MARF is an open-source research platform and a collection of voice/sound...
research
08/31/2018

Total Recall, Language Processing, and Software Engineering

A broad class of software engineering problems can be generalized as the...
research
05/12/2021

Assessing Semantic Frames to Support Program Comprehension Activities

Software developers often rely on natural language text that appears in ...

Please sign up or login with your details

Forgot password? Click here to reset