The Early Bird Catches the Worm: Better Early Life Cycle Defect Predictors

05/24/2021
by   N. C. Shrikanth, et al.
0

Before researchers rush to reason across all available data, they should first check if the information is densest within some small region. We say this since, in 240 GitHub projects, we find that the information in that data “clumps” towards the earliest parts of the project. In fact, a defect prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives. Using just this early life cycle data, we can build models very quickly (using weeks, not months, of CPU time). Also, we can find simple models (with just two features) that generalize to hundreds of software projects. Based on this experience, we warn that prior work on generalizing software engineering defect prediction models may have needlessly complicated an inherently simple process. Further, prior work that focused on later-life cycle data now needs to be revisited since their conclusions were drawn from relatively uninformative regions. Replication note: all our data and scripts are online at https://github.com/snaraya7/early-defect-prediction-tse.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 11

11/26/2020

Early Life Cycle Software Defect Prediction. Why? How?

Many researchers assume that, for software analytics, "more data is bett...
08/21/2020

Revisiting Process versus Product Metrics: a Large Scale Analysis

Numerous methods can build predictive models from software data. But wha...
11/06/2019

Learning GENERAL Principles from Hundreds of Software Projects

When one exemplar project, which we call the "bellwether", offers the be...
11/14/2019

On the Time-Based Conclusion Stability of Software Defect Prediction Models

Researchers in empirical software engineering often make claims based on...
05/17/2021

Deep Learning Models in Software Requirements Engineering

Requirements elicitation is an important phase of any software project: ...
04/08/2022

End-of-Life of Software How is it Defined and Managed?

The rapid development of new software and algorithms, fueled by the imme...
10/26/2020

Deep reinforced learning enables solving discrete-choice life cycle models to analyze social security reforms

Discrete-choice life cycle models can be used to, e.g., estimate how soc...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.