Revisiting Process versus Product Metrics: a Large Scale Analysis

08/21/2020
by   Suvodeep Majumder, et al.
0

Numerous methods can build predictive models from software data. But what methods and conclusions should we endorse as we move from analytics in-the small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)? To answer this question, we recheck prior small scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98 That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change, dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become very confused and exhibit very high variance).

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 4

page 5

page 6

page 9

page 10

page 11

05/24/2021

The Early Bird Catches the Worm: Better Early Life Cycle Defect Predictors

Before researchers rush to reason across all available data, they should...
11/26/2020

Early Life Cycle Software Defect Prediction. Why? How?

Many researchers assume that, for software analytics, "more data is bett...
11/14/2019

On the Time-Based Conclusion Stability of Software Defect Prediction Models

Researchers in empirical software engineering often make claims based on...
03/26/2018

A Quality Model for Actionable Analytics in Rapid Software Development

Background: Accessing relevant data on the software product, process, an...
04/03/2022

BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

Most AI projects start with a Python notebook running on a single laptop...
03/13/2018

Building Better Quality Predictors Using "ε-Dominance"

Despite extensive research, many methods in software quality prediction ...
02/02/2022

How to Improve Deep Learning for Software Analytics (a case study with code smell detection)

To reduce technical debt and make code more maintainable, it is importan...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.