The Early Bird Catches the Worm: Better Early Life Cycle Defect Predictors

05/24/2021
by   N. C. Shrikanth, et al.
0

Before researchers rush to reason across all available data, they should first check if the information is densest within some small region. We say this since, in 240 GitHub projects, we find that the information in that data “clumps” towards the earliest parts of the project. In fact, a defect prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives. Using just this early life cycle data, we can build models very quickly (using weeks, not months, of CPU time). Also, we can find simple models (with just two features) that generalize to hundreds of software projects. Based on this experience, we warn that prior work on generalizing software engineering defect prediction models may have needlessly complicated an inherently simple process. Further, prior work that focused on later-life cycle data now needs to be revisited since their conclusions were drawn from relatively uninformative regions. Replication note: all our data and scripts are online at https://github.com/snaraya7/early-defect-prediction-tse.

READ FULL TEXT

page 5

page 11

research
11/26/2020

Early Life Cycle Software Defect Prediction. Why? How?

Many researchers assume that, for software analytics, "more data is bett...
research
08/21/2020

Revisiting Process versus Product Metrics: a Large Scale Analysis

Numerous methods can build predictive models from software data. But wha...
research
11/06/2019

Learning GENERAL Principles from Hundreds of Software Projects

When one exemplar project, which we call the "bellwether", offers the be...
research
01/16/2023

Optimizing Predictions for Very Small Data Sets: a case study on Open-Source Project Health Prediction

When learning from very small data sets, the resulting models can make m...
research
11/14/2019

On the Time-Based Conclusion Stability of Software Defect Prediction Models

Researchers in empirical software engineering often make claims based on...
research
05/17/2021

Deep Learning Models in Software Requirements Engineering

Requirements elicitation is an important phase of any software project: ...
research
10/26/2020

Deep reinforced learning enables solving discrete-choice life cycle models to analyze social security reforms

Discrete-choice life cycle models can be used to, e.g., estimate how soc...

Please sign up or login with your details

Forgot password? Click here to reset