Revisiting Process versus Product Metrics: a Large Scale Analysis

08/21/2020
by   Suvodeep Majumder, et al.
0

Numerous methods can build predictive models from software data. But what methods and conclusions should we endorse as we move from analytics in-the small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)? To answer this question, we recheck prior small scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98 That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change, dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become very confused and exhibit very high variance).

READ FULL TEXT

page 1

page 2

page 4

page 5

page 6

page 9

page 10

page 11

research
05/24/2021

The Early Bird Catches the Worm: Better Early Life Cycle Defect Predictors

Before researchers rush to reason across all available data, they should...
research
11/26/2020

Early Life Cycle Software Defect Prediction. Why? How?

Many researchers assume that, for software analytics, "more data is bett...
research
11/14/2019

On the Time-Based Conclusion Stability of Software Defect Prediction Models

Researchers in empirical software engineering often make claims based on...
research
03/26/2018

A Quality Model for Actionable Analytics in Rapid Software Development

Background: Accessing relevant data on the software product, process, an...
research
11/06/2019

Learning GENERAL Principles from Hundreds of Software Projects

When one exemplar project, which we call the "bellwether", offers the be...
research
01/16/2023

Optimizing Predictions for Very Small Data Sets: a case study on Open-Source Project Health Prediction

When learning from very small data sets, the resulting models can make m...
research
04/03/2022

BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

Most AI projects start with a Python notebook running on a single laptop...

Please sign up or login with your details

Forgot password? Click here to reset