Revisiting Process versus Product Metrics: a Large Scale Analysis

by   Suvodeep Majumder, et al.

Numerous methods can build predictive models from software data. But what methods and conclusions should we endorse as we move from analytics in-the small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)? To answer this question, we recheck prior small scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98 That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change, dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become very confused and exhibit very high variance).



There are no comments yet.


page 1

page 2

page 4

page 5

page 6

page 9

page 10

page 11


The Early Bird Catches the Worm: Better Early Life Cycle Defect Predictors

Before researchers rush to reason across all available data, they should...

Early Life Cycle Software Defect Prediction. Why? How?

Many researchers assume that, for software analytics, "more data is bett...

On the Time-Based Conclusion Stability of Software Defect Prediction Models

Researchers in empirical software engineering often make claims based on...

A Quality Model for Actionable Analytics in Rapid Software Development

Background: Accessing relevant data on the software product, process, an...

BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

Most AI projects start with a Python notebook running on a single laptop...

Building Better Quality Predictors Using "ε-Dominance"

Despite extensive research, many methods in software quality prediction ...

How to Improve Deep Learning for Software Analytics (a case study with code smell detection)

To reduce technical debt and make code more maintainable, it is importan...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.