Use of Source Code Similarity Metrics in Software Defect Prediction

08/29/2018
by   Ahmet Okutan, et al.
0

In recent years, defect prediction has received a great deal of attention in the empirical software engineering world. Predicting software defects before the maintenance phase is very important not only to decrease the maintenance costs but also increase the overall quality of a software product. There are different types of product, process, and developer based software metrics proposed so far to measure the defectiveness of a software system. This paper suggests to use a novel set of software metrics which are based on the similarities detected among the source code files in a software project. To find source code similarities among different files of a software system, plagiarism and clone detection techniques are used. Two simple similarity metrics are calculated for each file, considering its overall similarity to the defective and non defective files in the project. Using these similarity metrics, we predict whether a specific file is defective or not. Our experiments on 10 open source data sets show that depending on the amount of detected similarity, proposed metrics could achieve significantly better performance compared to the existing static code metrics in terms of the area under the curve (AUC).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/08/2021

On the differences between quality increasing and other changes in open source Java projects

Static software metrics, e.g., size, complexity and coupling are used in...
research
03/02/2021

Apples, Oranges Fruits – Understanding Similarity of Software Projects Through The Lens of Dissimilar Artifacts

The growing availability of open source projects has facilitated develop...
research
04/01/2022

A Large-scale Dataset of (Open Source) License Text Variants

We introduce a large-scale dataset of the complete texts of free/open so...
research
12/16/2017

Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017

We report on the Wikidata vandalism detection task at the WSDM Cup 2017....
research
02/12/2020

Multi-Objective Optimization for Token-Based Clone Detection

Clone detection plays an important role in software engineering. Finding...
research
12/22/2021

End to End Software Engineering Research

End to end learning is machine learning starting in raw data and predict...
research
08/20/2018

Leveraging Historical Associations between Requirements and Source Code to Identify Impacted Classes

As new requirements are introduced and implemented in a software system,...

Please sign up or login with your details

Forgot password? Click here to reset