Revisiting the size effect in software fault prediction models

by   Amjed Tahir, et al.

BACKGROUND: In object oriented (OO) software systems, class size has been acknowledged as having an indirect effect on the relationship between certain artifact characteristics, captured via metrics, and faultproneness, and therefore it is recommended to control for size when designing fault prediction models. AIM: To use robust statistical methods to assess whether there is evidence of any true effect of class size on fault prediction models. METHOD: We examine the potential mediation and moderation effects of class size on the relationships between OO metrics and number of faults. We employ regression analysis and bootstrapping-based methods to investigate the mediation and moderation effects in two widely-used datasets comprising seventeen systems. RESULTS: We find no strong evidence of a significant mediation or moderation effect of class size on the relationships between OO metrics and faults. In particular, size appears to have a more significant mediation effect on CBO and Fan-out than other metrics, although the evidence is not consistent in all examined systems. On the other hand, size does appear to have a significant moderation effect on WMC and CBO in most of the systems examined. Again, the evidence provided is not consistent across all examined systems CONCLUSION: We are unable to confirm if class size has a significant mediation or moderation effect on the relationships between OO metrics and the number of faults. We contend that class size does not fully explain the relationships between OO metrics and the number of faults, and it does not always affect the strength/magnitude of these relationships. We recommend that researchers consider the potential mediation and moderation effect of class size when building their prediction models, but this should be examined independently for each system.


Does class size matter? An in-depth assessment of the effect of class size in software defect prediction

In the past 20 years, defect prediction studies have generally acknowled...

A Simulation Study of Bandit Algorithms to Address External Validity of Software Fault Prediction

Various software fault prediction models and techniques for building alg...

Exploring the relationship between performance metrics and cost saving potential of defect prediction models

Performance metrics are a core component of the evaluation of any machin...

Examining the Relationship of Code and Architectural Smells with Software Vulnerabilities

Context: Security is vital to software developed for commercial or perso...

Combining Spreadsheet Smells for Improved Fault Prediction

Spreadsheets are commonly used in organizations as a programming tool fo...

Co-evolving Tracing and Fault Injection with Box of Pain

Distributed systems are hard to reason about largely because of uncertai...

Assessing Software Defection Prediction Performance: Why Using the Matthews Correlation Coefficient Matters

Context: There is considerable diversity in the range and design of comp...

Please sign up or login with your details

Forgot password? Click here to reset