An empirical evaluation of the usefulness of Tree Kernels for Commit-time Defect Detection in large software systems

06/21/2021
by   Hareem Sahar, et al.
0

Defect detection at commit check-in time prevents the introduction of defects into software systems. Current defect detection approaches rely on metric-based models which are not very accurate and whose results are not directly useful for developers. We propose a method to detect bug-inducing commits by comparing the incoming changes with all past commits in the project, considering both those that introduced defects and those that did not. Our method considers individual changes in the commit separately, at the method-level granularity. Doing so helps developers as they are informed of specific methods that need further attention instead of being told that the entire commit is problematic. Our approach represents source code as abstract syntax trees and uses tree kernels to estimate the similarity of the code with previous commits. We experiment with subtree kernels (STK), subset tree kernels (SSTK), or partial tree kernels (PTK). An incoming change is then classified using a K-NN classifier on the past changes. We evaluate our approach on the BigCloneBench benchmark and on the Technical Debt dataset, using the NiCad clone detector as the baseline. Our experiments with the BigCloneBench benchmark show that the tree kernel approach can detect clones with a comparable MAP to that of NiCad. Also, on defect detection with the Technical Debt dataset, tree kernels are least as effective as NiCad with MRR, F-score, and Accuracy of 0.87, 0.80, and 0.82 respectively.

READ FULL TEXT

page 1

page 8

research
03/29/2023

An AST-based Code Change Representation and its Performance in Just-in-time Vulnerability Prediction

The presence of software vulnerabilities is an ever-growing issue in sof...
research
04/23/2023

U Owns the Code That Changes and How Marginal Owners Resolve Issues Slower in Low-Quality Source Code

[Context] Accurate time estimation is a critical aspect of predictable s...
research
12/06/2019

ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid Ranking

Commit messages record code changes (e.g., feature modifications and bug...
research
07/30/2018

Towards an automated approach for bug fix pattern detection

The characterization of bug datasets is essential to support the evaluat...
research
04/01/2021

Assessing the Exposure of Software Changes: The DiPiDi Approach

Context: Changing a software application with many build-time configurat...
research
03/05/2021

Does chronology matter in JIT defect prediction? A Partial Replication Study

Just-In-Time (JIT) models detect the fix-inducing changes (or defect-ind...
research
09/10/2019

LVMapper: A Large-variance Clone Detector Using Sequencing Alignment Approach

To detect large-variance code clones (i.e. clones with relatively more d...

Please sign up or login with your details

Forgot password? Click here to reset