IRJIT – An Information Retrieval Technique for Just-in-time Defect Identification

10/05/2022
by   Hareem Sahar, et al.
0

Defect identification at commit check-in time prevents the introduction of defects into software. Current defect identification approaches either rely on manually crafted features such as change metrics or involve training expensive machine learning or deep learning models. By relying on a complex underlying model, these approaches are not often explainable, which means the models' predictions cannot be understood by the developers. An approach that is not explainable might not be adopted in real-life development environments because of developers' lack of trust in its results. Furthermore, because of an extensive training process, these approaches cannot readily learn from new examples when they arrive, making them unsuitable for fast online prediction. To address these limitations, we propose an approach called IRJIT that employs information retrieval on source code, and labels new commits as buggy or clean based on their similarity to past buggy or clean commits. Our approach is online and explainable as it can learn from new data without retraining, and developers can see the documents that support a prediction. Through an evaluation of 8 open-source projects, we show that IRJIT achieves AUC and F1 score close to the state-of-the-art machine learning approach JITLine, without considerable re-training.

READ FULL TEXT

page 1

page 6

page 7

page 8

page 9

page 10

page 11

research
11/04/2022

Explainable Information Retrieval: A Survey

Explainable information retrieval is an emerging research area aiming to...
research
04/06/2018

Towards Identifying Paid Open Source Developers - A Case Study with Mozilla Developers

Open source development contains contributions from both hired and volun...
research
10/11/2021

Graph-Based Machine Learning Improves Just-in-Time Defect Prediction

The increasing complexity of today's software requires the contribution ...
research
09/26/2022

An Explainable Machine Learning Approach to Visual-Interactive Labeling: A Case Study on Non-communicable Disease Data

We introduce a new visual-interactive tool: Explainable Labeling Assista...
research
01/17/2023

Towards Improving the Explainability of Text-based Information Retrieval with Knowledge Graphs

Thanks to recent advancements in machine learning, vector-based methods ...
research
10/02/2020

Augmenting Machine Learning with Information Retrieval to Recommend Real Cloned Code Methods for Code Completion

Software developers frequently reuse source code from repositories as it...
research
08/07/2019

Text mining policy: Classifying forest and landscape restoration policy agenda with neural information retrieval

Dozens of countries have committed to restoring the ecological functiona...

Please sign up or login with your details

Forgot password? Click here to reset