Multi-Programming-Language Commits in OSS: An Empirical Study on Apache Projects

03/22/2021
by   Zengyang Li, et al.
0

Modern software systems, such as Spark, are usually written in multiple programming languages (PLs). Besides benefiting from code reuse, such systems can also take advantages of specific PLs to implement certain features, to meet various quality needs, and to improve development efficiency. In this context, a change to such systems may need to modify source files written in different PLs. We define a multi-programming-language commit (MPLC) in a version control system (e.g., Git) as a commit that involves modified source files written in two or more PLs. To our knowledge, the phenomenon of MPLCs in software development has not been explored yet. In light of the potential impact of MPLCs on development difficulty and software quality, we performed an empirical study to understand the state of MPLCs, their change complexity, as well as their impact on open time of issues and bug proneness of source files in real-life software projects. By exploring the MPLCs in 20 non-trivial Apache projects with 205,994 commits, we obtained the following findings: (1) 9 the commits from all the projects are MPLCs, and the proportion of MPLCs in 80 of the projects goes to a relatively stable level; (2) more than 90 MPLCs from all the projects involve source files written in two PLs; (3) the change complexity of MPLCs is significantly higher than that of non-MPLCs in all projects; (4) issues fixed in MPLCs take significantly longer to be resolved than issues fixed in non-MPLCs in 80 files that have been modified in MPLCs tend to be more bug-prone than source files that have never been modified in MPLCs. These findings provide practitioners with useful insights on the architecture design and quality management of software systems written in multiple PLs.

READ FULL TEXT

page 3

page 4

page 5

page 7

page 8

page 9

page 10

page 11

research
07/05/2023

Understanding Resolution of Multi-Language Bugs: An Empirical Study on Apache Projects

Background: In modern software systems, more and more systems are writte...
research
10/27/2020

Are Multi-language Design Smells Fault-prone? An Empirical Study

Nowadays, modern applications are developed using components written in ...
research
11/13/2021

Refactoring for Reuse: An Empirical Study

Refactoring is the de-facto practice to optimize software health. While ...
research
06/21/2022

An Empirical Study On Correlation between Readme Content and Project Popularity

Readme in GitHub repositories serves as a preliminary source of informat...
research
04/27/2022

Towards Exploring the Code Reuse from Stack Overflow during Software Development

As one of the most well-known programmer Q A websites, Stack Overflow ...
research
02/24/2021

Hero: On the Chaos When PATH Meets Modules

Ever since its first release in 2009, the Go programming language (Golan...
research
05/09/2023

Using Knowledge Units of Programming Languages to Recommend Reviewers for Pull Requests: An Empirical Study

Code review is a key element of quality assurance in software developmen...

Please sign up or login with your details

Forgot password? Click here to reset