Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data

03/21/2021
by   Samuel W. Flint, et al.
0

Many software engineering research papers rely on time-based data (e.g., commit timestamps, issue report creation/update/close dates, release dates). Like most real-world data however, time-based data is often dirty. To date, there are no studies that quantify how frequently such data is used by the software engineering research community, or investigate sources of and quantify how often such data is dirty. Depending on the research task and method used, including such dirty data could affect the research results. This paper presents the first survey of papers that utilize time-based data, published in the Mining Software Repositories (MSR) conference series. Out of the 690 technical track and data papers published in MSR 2004–2020, we saw at least 35 Heritage infrastructures to help identify and quantify several sources of dirty commit timestamp data. Finally we provide guidelines/best practices for researchers utilizing time-based data from Git repositories.

READ FULL TEXT

page 6

page 8

research
09/09/2022

Pitfalls and Guidelines for Using Time-Based Git Data

Many software engineering research papers rely on time-based data (e.g.,...
research
12/22/2017

Behavioral software engineering - guidelines for qualitative studies

Researchers are increasingly recognizing the importance of human aspects...
research
12/28/2021

Recruiting credible participants for field studies in software engineering research

Context: Software practitioners are a primary provider of information fo...
research
09/06/2017

Extracting data from vector figures in scholarly articles

It is common for authors to communicate their results in graphical figur...
research
08/01/2019

Optimum Testing Time of Software using Size-Biased Concepts

Optimum software release time problem has been an interesting area of re...
research
12/02/2020

Software Module Clustering: An In-Depth Literature Analysis

Software module clustering is an unsupervised learning method used to cl...
research
11/23/2020

Distance-based Data Cleaning: A Survey (Technical Report)

With the rapid development of the internet technology, dirty data are co...

Please sign up or login with your details

Forgot password? Click here to reset