Utilizing Source Code Syntax Patterns to Detect Bug Inducing Commits using Machine Learning Models

12/07/2022
by   Md Nadim, et al.
0

Detecting Bug Inducing Commit (BIC) or Just in Time (JIT) defect prediction using Machine Learning (ML) based models requires tabulated feature values extracted from the source code or historical maintenance data of a software system. Existing studies have utilized meta-data from source code repositories (we named them GitHub Statistics or GS), n-gram-based source code text processing, and developer's information (e.g., the experience of a developer) as the feature values in ML-based bug detection models. However, these feature values do not represent the source code syntax styles or patterns that a developer might prefer over available valid alternatives provided by programming languages. This investigation proposed a method to extract features from its source code syntax patterns to represent software commits and investigate whether they are helpful in detecting bug proneness in software systems. We utilize six manually and two automatically labeled datasets from eight open-source software projects written in Java, C++, and Python programming languages. Our datasets contain 642 manually labeled and 4,014 automatically labeled buggy and non-buggy commits from six and two subject systems, respectively. The subject systems contain a diverse number of revisions, and they are from various application domains. Our investigation shows the inclusion of the proposed features increases the performance of detecting buggy and non-buggy software commits using five different machine learning classification models. Our proposed features also perform better in detecting buggy commits using the Deep Belief Network generated features and classification model. This investigation also implemented a state-of-the-art tool to compare the explainability of predicted buggy commits using our proposed and traditional features and found that our proposed features provide better reasoning about buggy.....

READ FULL TEXT
research
01/25/2022

Leveraging Structural Properties of Source Code Graphs for Just-In-Time Bug Prediction

The most common use of data visualization is to minimize the complexity ...
research
03/21/2018

Estimating defectiveness of source code: A predictive model using GitHub content

Two key contributions presented in this paper are: i) A method for build...
research
06/17/2020

An Automatically Created Novel Bug Dataset and its Validation in Bug Prediction

Bugs are inescapable during software development due to frequent code ch...
research
03/07/2017

End-to-End Prediction of Buffer Overruns from Raw Source Code via Neural Memory Networks

Detecting buffer overruns from a source code is one of the most common a...
research
09/07/2023

Identifying Defect-Inducing Changes in Visual Code

Defects, or bugs, often form during software development. Identifying th...
research
02/28/2022

ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

In this paper, we present ApacheJIT, a large dataset for Just-In-Time de...
research
06/01/2023

Analysis of ChatGPT on Source Code

This paper explores the use of Large Language Models (LLMs) and in parti...

Please sign up or login with your details

Forgot password? Click here to reset