ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

02/28/2022
by   Hossein Keshavarz, et al.
0

In this paper, we present ApacheJIT, a large dataset for Just-In-Time defect prediction. ApacheJIT consists of clean and bug-inducing software changes in popular Apache projects. ApacheJIT has a total of 106,674 commits (28,239 bug-inducing and 78,435 clean commits). Having a large number of commits makes ApacheJIT a suitable dataset for machine learning models, especially deep learning models that require large training sets to effectively generalize the patterns present in the historical data to future data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/08/2023

Defectors: A Large, Diverse Python Dataset for Defect Prediction

Defect prediction has been a popular research topic where machine learni...
research
05/25/2023

Too Few Bug Reports? Exploring Data Augmentation for Improved Changeset-based Bug Localization

Modern Deep Learning (DL) architectures based on transformers (e.g., BER...
research
06/20/2022

PR-SZZ: How pull requests can support the tracing of defects in software repositories

The SZZ algorithm represents a standard way to identify bug fixing commi...
research
09/25/2021

Investigation of Dataset Features for Just-in-Time Defect Prediction

Just-in-time (JIT) defect prediction refers to the technique of predicti...
research
12/07/2022

Utilizing Source Code Syntax Patterns to Detect Bug Inducing Commits using Machine Learning Models

Detecting Bug Inducing Commit (BIC) or Just in Time (JIT) defect predict...
research
12/13/2022

Fonte: Finding Bug Inducing Commits from Failures

A Bug Inducing Commit (BIC) is a commit that introduces a software bug i...
research
11/25/2019

Distortion and Faults in Machine Learning Software

Machine learning software, deep neural networks (DNN) software in partic...

Please sign up or login with your details

Forgot password? Click here to reset