Investigation of Dataset Features for Just-in-Time Defect Prediction

by   Giuseppe Ng, et al.

Just-in-time (JIT) defect prediction refers to the technique of predicting whether a code change is defective. Many contributions have been made in this area through the excellent dataset by Kamei. In this paper, we revisit the dataset and highlight preprocessing difficulties with the dataset and the limitations of the dataset on unsupervised learning. Secondly, we propose certain features in the Kamei dataset that can be used for training models. Lastly, we discuss the limitations of the dataset's features.


page 1

page 2

page 3

page 4


The Need for a Fine-grained approach in Just-in-Time Defect Prediction

With software system complexity leading to the rise of software defects,...

JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction

A Just-In-Time (JIT) defect prediction model is a classifier to predict ...

ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

In this paper, we present ApacheJIT, a large dataset for Just-In-Time de...

Supervised Hebbian learning: toward eXplainable AI

In neural network's Literature, Hebbian learning traditionally refers to...

Exploiting new forms of data to study the private rented sector: strengths and limitations of a database of rental listings

Reviews of official statistics for UK housing have noted that developmen...

Improving the efficiency of spectral features extraction by structuring the audio files

The extraction of spectral features from a music clip is a computational...

Learning to Learn to Predict Performance Regressions in Production at Meta

Catching and attributing code change-induced performance regressions in ...