Data of low quality is better than no data

by   Richard Torkar, et al.

Missing data is not uncommon in empirical software engineering research but a common way to handle it is to remove data completely. We believe that this is wasteful and should not be done out of habit. This paper aims to present a typical case in empirical software engineering research: Analyzing data, consisting of missingness, which has been collected and classified by others. By transferring empirical analysis methods from other disciplines, we here introduce the reader to approaches suitable for analyzing empirical software engineering data. We present a case study where we contrast with previous studies' methodologies (in effort estimation). Using principled Bayesian data analysis, together with state of art imputation techniques, we show how missing data and Bayesian data analysis can be considered a good match. The results show that by using low-quality data, instead of throwing data away, we still gain a better understanding of the resulting analysis if we are prepared to embrace uncertainty. Inferences can become weaker but, we argue, this is how it should be. Empirical software engineering research should make use of more state of art missing data techniques, not throw data away, and lean towards Bayesian data analysis in order to get a more nuanced view of the challenges we investigate.


Bayesian Data Analysis in Empirical Software Engineering Research

Statistics comes in two main flavors: frequentist and Bayesian. For hist...

Arguing Practical Significance in Software Engineering Using Bayesian Data Analysis

This paper provides a case for using Bayesian data analysis (BDA) to mak...

Applying Bayesian Analysis Guidelines to Empirical Software Engineering Data: The Case of Programming Languages and Code Quality

Statistical analysis is the tool of choice to turn data into information...

Application of Statistical Methods in Software Engineering: Theory and Practice

The experimental evaluation of the methods and concepts covered in softw...

Inter-Coder Agreement for Improving Reliability in Software Engineering Qualitative Research

In recent years, the research on empirical software engineering that use...

A Taxonomy of Data Quality Challenges in Empirical Software Engineering

Reliable empirical models such as those used in software effort estimati...

Data Quality in Empirical Software Engineering: A Targeted Review

Context: The utility of prediction models in empirical software engineer...

Please sign up or login with your details

Forgot password? Click here to reset