Software Artifact Mining in Software Engineering Conferences: A Meta-Analysis

07/18/2022
by   Zeinab Abou Khalil, et al.
0

Background: Software development results in the production of various types of artifacts: source code, version control system metadata, bug reports, mailing list conversations, test data, etc. Empirical software engineering (ESE) has thrived mining those artifacts to uncover the inner workings of software development and improve its practices. But which artifacts are studied in the field is a moving target, which we study empirically in this paper.Aims: We quantitatively characterize the most frequently mined and co-mined software artifacts in ESE research and the research purposes they support.Method: We conduct a meta-analysis of artifact mining studies published in 11 top conferences in ESE, for a total of 9621 papers. We use natural language processing (NLP) techniques to characterize the types of software artifacts that are most often mined and their evolution over a 16-year period (2004-2020). We analyze the combinations of artifact types that are most often mined together, as well as the relationship between study purposes and mined artifacts.Results: We find that: (1) mining happens in the vast majority of analyzed papers, (2) source code and test data are the most mined artifacts, (3) there is an increasing interest in mining novel artifacts, together with source code, (4) researchers are most interested in the evaluation of software systems and use all possible empirical signals to support that goal.

READ FULL TEXT
research
07/05/2023

An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code

Large language models trained on source code can support a variety of so...
research
08/31/2023

DevGPT: Studying Developer-ChatGPT Conversations

The emergence of large language models (LLMs) such as ChatGPT has disrup...
research
02/24/2022

Should I Get Involved? On the Privacy Perils of Mining Software Repositories for Research Participants

Mining Software Repositories (MSRs) is an evidence-based methodology tha...
research
04/13/2021

Science-Software Linkage: The Challenges of Traceability between Scientific Knowledge and Software Artifacts

Although computer science papers are often accompanied by software artif...
research
06/19/2021

gazel: Supporting Source Code Edits in Eye-Tracking Studies

Eye tracking tools are used in software engineering research to study va...
research
01/02/2023

Triple Graph Grammars for Multi-version Models

Like conventional software projects, projects in model-driven software e...
research
04/04/2021

Assert Use and Defectiveness in Industrial Code

The use of asserts in code has received increasing attention in the soft...

Please sign up or login with your details

Forgot password? Click here to reset