Predicting Defective Lines Using a Model-Agnostic Technique

Defect prediction models are proposed to help a team prioritize source code areas files that need Software QualityAssurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the whole filewhile only a small fraction of its source code lines are defective. Indeed, we find that as little as 1 defective. Hence, in this work, we propose a novel framework (called LINE-DP) to identify defective lines using a model-agnostic technique, i.e., an Explainable AI technique that provides information why the model makes such a prediction. Broadly speaking, our LINE-DP first builds a file-level defect model using code token features. Then, our LINE-DP uses a state-of-the-art model-agnostic technique (i.e.,LIME) to identify risky tokens, i.e., code tokens that lead the file-level defect model to predict that the file will be defective. Then, the lines that contain risky tokens are predicted as defective lines. Through a case study of 32 releases of nine Java open source systems, our evaluation results show that our LINE-DP achieves an average recall of 0.61, a false alarm rate of 0.47, a top 20 false alarm of 16, which are statistically better than six baseline approaches. Our evaluation shows that our LINE-DP requires an average computation time of 10 seconds including model construction and defective line identification time. In addition, we find that 63 LINE-DP are related to common defects (e.g., argument change, condition change). These results suggest that our LINE-DP can effectively identify defective lines that contain common defectswhile requiring a smaller amount of inspection effort and a manageable computation cost.

READ FULL TEXT

page 21

page 22

page 23

research
03/26/2020

On-the-Fly Adaptation of Source Code Models using Meta-Learning

The ability to adapt to unseen, local contexts is an important challenge...
research
07/15/2019

DeepRace: Finding Data Race Bugs via Deep Learning

With the proliferation of multi-core hardware, parallel programs have be...
research
04/08/2020

Dependency-Based Neural Representations for Classifying Lines of Programs

We investigate the problem of classifying a line of program as containin...
research
11/09/2022

Syntax-Aware On-the-Fly Code Completion

Code completion aims to help improve developers' productivity by suggest...
research
06/22/2023

FLAG: Finding Line Anomalies (in code) with Generative AI

Code contains security and functional bugs. The process of identifying a...
research
03/12/2021

JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction

A Just-In-Time (JIT) defect prediction model is a classifier to predict ...
research
02/27/2021

A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms

Abstract syntax tree (AST) mapping algorithms are widely used to analyze...

Please sign up or login with your details

Forgot password? Click here to reset