A Taxonomy of Data Quality Challenges in Empirical Software Engineering

06/11/2021
by   Michael Franklin Bosu, et al.
0

Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of the data used in measurement and prediction systems warrants increasingly close scrutiny. In this paper we propose a taxonomy of data quality challenges in empirical software engineering, based on an extensive review of prior research. We consider current assessment techniques for each quality issue and proposed mechanisms to address these issues, where available. Our taxonomy classifies data quality issues into three broad areas: first, characteristics of data that mean they are not fit for modeling; second, data set characteristics that lead to concerns about the suitability of applying a given model to another data set; and third, factors that prevent or limit data accessibility and trust. We identify this latter area as of particular need in terms of further research.

READ FULL TEXT
research
12/20/2020

Experience: Quality Benchmarking of Datasets Used in Software Effort Estimation

Data is a cornerstone of empirical software engineering (ESE) research a...
research
05/23/2021

Data Quality in Empirical Software Engineering: A Targeted Review

Context: The utility of prediction models in empirical software engineer...
research
04/19/2022

Antipatterns in Software Classification Taxonomies

Empirical results in software engineering have long started to show that...
research
02/01/2023

Under the Bridge: Trolling and the Challenges of Recruiting Software Developers for Empirical Research Studies

Much of software engineering research focuses on tools, algorithms, and ...
research
03/24/2015

Measuring Software Quality in Use: State-of-the-Art and Research Challenges

Software quality in use comprises quality from the user's perspective. I...
research
11/14/2019

On the Time-Based Conclusion Stability of Software Defect Prediction Models

Researchers in empirical software engineering often make claims based on...
research
04/01/2019

Data of low quality is better than no data

Missing data is not uncommon in empirical software engineering research ...

Please sign up or login with your details

Forgot password? Click here to reset