Bayesian Data Analysis in Empirical Software Engineering Research

by   Carlo A. Furia, et al.

Statistics comes in two main flavors: frequentist and Bayesian. For historical and technical reasons, frequentist statistics have traditionally dominated empirical data analysis, and certainly remain prevalent in empirical software engineering. This situation is unfortunate because frequentist statistics suffer from a number of shortcomings---such as lack of flexibility and results that are unintuitive and hard to interpret---that make them ill-suited especially to deal with the large, heterogeneous data that is increasingly available for empirical analysis of software engineering practice. In this paper, we pinpoint these shortcomings, and present Bayesian data analysis techniques that work better on the same data---as they can provide clearer results that are simultaneously robust and nuanced. After a short, high-level introduction to the basic tools of Bayesian statistics, our presentation targets the reanalysis of two empirical studies targeting data about the effectiveness of automatically generated tests and the performance of programming languages. By contrasting the original frequentist analysis to our new Bayesian analysis, we demonstrate concrete advantages of using Bayesian techniques, and we advocate a prominent role for them in empirical software engineering research and practice.


page 1

page 2

page 3

page 4


Application of Statistical Methods in Software Engineering: Theory and Practice

The experimental evaluation of the methods and concepts covered in softw...

Data of low quality is better than no data

Missing data is not uncommon in empirical software engineering research ...

Applying Bayesian Analysis Guidelines to Empirical Software Engineering Data: The Case of Programming Languages and Code Quality

Statistical analysis is the tool of choice to turn data into information...

Arguing Practical Significance in Software Engineering Using Bayesian Data Analysis

This paper provides a case for using Bayesian data analysis (BDA) to mak...

Towards Causal Analysis of Empirical Software Engineering Data: The Impact of Programming Languages on Coding Competitions

There is abundant observational data in the software engineering domain,...

Inter-Coder Agreement for Improving Reliability in Software Engineering Qualitative Research

In recent years, the research on empirical software engineering that use...

Please sign up or login with your details

Forgot password? Click here to reset