The Health and Wealth of OSS Projects: Evidence from Community Activities and Product Evolution

09/29/2017
by   Saya Onoue, et al.
0

Background: Understanding the condition of OSS projects is important to analyze features and predict the future of projects. In the field of demography and economics, health and wealth are considered to understand the condition of a country. Aim: In this paper, we apply this framework to OSS projects to understand the communities and the evolution of OSS projects from the perspectives of health and wealth. Method: We define two measures of Workforce (WF) and Gross Product Pull Requests (GPPR). We analyze OSS projects in GitHub and investigate three typical cases. Results: We find that wealthy projects attract and rely on the casual workforce. Less wealthy projects may require additional efforts from their more experienced contributors. Conclusions: This paper presents an approach to assess the relationship between health and wealth of OSS projects. An interactive demo of our analysis is available at goo.gl/Ig6NTR.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/09/2020

Is this GitHub Project Maintained? Measuring the Level of Maintenance Activity of Open-Source Projects

Context: GitHub hosts an impressive number of high-quality OSS projects....
10/28/2021

On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests

Pull-based development has enabled numerous volunteers to contribute to ...
03/25/2021

Quality Gatekeepers: Investigating the Effects ofCode Review Bots on Pull Request Activities

Software bots have been facilitating several development activities in O...
04/30/2021

Participatory Budgeting with Donations and Diversity Constraints

Participatory budgeting (PB) is a democratic process where citizens join...
03/18/2021

Tracking Hackathon Code Creation and Reuse

Background: Hackathons have become popular events for teams to collabora...
03/01/2021

The Secret Life of Hackathon Code

Background: Hackathons have become popular events for teams to collabora...
10/26/2016

Kissing Cuisines: Exploring Worldwide Culinary Habits on the Web

Food and nutrition occupy an increasingly prevalent space on the web, an...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

It is well-known that Open Source Software (OSS) components play a critical role of contemporary software development. With the emergence of open repositories like GitHub, OSS projects is now easily accessible and have been featured as popular and impactful software such as Linux, Ubuntu and Firefox OSS applications.

The onion model has been widely studied for sustainable OSS development communities [1, 2, 3], depicting an OSS community as being one dimension onion-like shape. Crowston and Howison concluded that assessing the health of an OSS project is not an easy task [4]. Due to the Bazaar-like structure and altruistic nature of contributors, it is hard to assess the success and livelihood of an OSS project. Furthermore, defining the success of OSS projects is difficult. Senyard et al. studied how a project can establish this community and be a success [5]. Targeting the initial phases of free software projects, they explain what is needed to facilitate the bazaar phase and remain successful. Many studies have categorized contributors based on their contributions. Marco et al. [6] discovered empirical evidence of the barriers faced by newcomers to OSS projects when placing their first contribution. They say that onboarding is important for online communities to leverage outsider contribution, and conclude that a smooth first contribution may increase the total number of successful contributions made by both single and long-term contributors.

In this study, we would like to assess OSS projects in the two dimensions of health and wealth. Our work is inspired by Hans Rosling’s talk on Health and Wealth of Nations111Hans Rosling: The best stats you’ve ever seen, TED 2006. http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen. The Health and Wealth visualization depicts how life expectancy of nations is correlated to their economics. Similarly, we would like to understand the Health and Wealth in terms of OSS communities and their projects. Concretely, we represent Health as the community activities such as contributor work rate, while Wealth is represented as the product evolution, which can be described as the completed pull requests over time. In a preliminary evaluation study, we use three case studies selected from 90 OSS projects to highlight the relationship between the Health and Wealth of an OSS project. The following research questions guide our study:

:

Does Wealth influence how OSS communities maintain their Health?

:

Does Health influence how OSS communities maintain their Wealth?

From the results, we observe the following:

  • Less wealthy projects may require additional efforts from their more experienced contributors.

  • Wealthy projects attract and rely on the inexperienced casual workforce.

  • Experienced contributors are major workforce of an OSS project.

  • Wealthy projects do not depend only on higher workforce if they have sufficient casual workforce.

Furthermore, more wealthy projects rely on casual contributors to maintain their Health. We envision that the added dimension of Wealth adds an economic values to the assessment of OSS projects and may lead to use understanding more about the success and livelihood of OSS projects. An interactive demo that shows the evolution of Health and Wealth of the studied OSS projects is available at {goo.gl/Ig6NTR}.

Ii Health and Wealth in OSS Projects

Ii-a Basic Concepts

The basic concepts of Health and Wealth are best represented by Tom Carden and Gapminder in their “The Wealth & Health of Nations”222The Wealth & Health of Nations
https://bost.ocks.org/mike/nations/

. They discuss Health and Wealth in the context of nations, showing that these two factors are very strongly related. Rosling says that 80% of the variance of Health can be explained by Wealth.

“This means that we know that increased Wealth is extremely strongly correlated with longer lifespans. There are some interesting details however. It seems that you can advance much faster as a nation if you are healthy first than if you are wealthy first”. Rosling then further elaborates: “Health cannot be bought at the supermarket.” He makes it clear that Health is an investment. He further explains how “You have to build infrastructure, you have to train people, and you have to educate the whole population.”

Our key idea in the paper is to show how these concepts can be utilized in an OSS setting. In country economics, the following metrics are considered to be health and wealth: Life Expectancy is a statistical measure of the average time persons are expected to live, while Gross Domestic Product (GDP) measures the total of goods and services produced in a given year within the borders of a given country[7].

Ii-B Health and Wealth Metrics in an OSS setting

Unlike countries of the world, OSS project are often depicted as Bazaar-like, with no set structure or organization[8, 9, 10]. In this setting, we propose that the two main factors that influence the livelihood of an OSS project: (i) the activities of its community members and (ii) the evolution of the product itself.

We first define the OSS Health as a indicative of three factors of how community activities are performed in a project based on our previous studies: (a) workrate (defined as labor) of each contributor[11], (b) attractiveness of new contributors to a community[12], and (c) active retention of experienced members[13]. We measure labor as community contributions within the projects. Similar to Rigby et al.[14], we take into account the contributor experience to evaluate attractiveness and retention factors. We only consider any source code changes as contributions (i.e., comments made by contributors are ignored).

Let be the number of contributions an individual has made in month . We call this weighted measure of as Workforce (WF), where the WF for a contributor in month , who has an experience of working months, is formally defined as follows:

Using as the number of months since a contributor first joined the project, the function describes the monthly contributions of the member to the OSS project. We use a linear decay function to consider a factor of the contributor’s experience (i.e., more experienced contributor will have less weight than a newcomer to the projects in their recent contributions).

We use the median of all of contributors in month , namely, , as the indicator of the OSS Health in month . By obtaining median, we can see the mediated workforce among highly active contributors and casual contributors.

Our definition of OSS Wealth is based on the evolution of product, accumulated source code patches. One approach to measure the evolution, especially for projects that use Git version control system, is by the number of Pull Requests (PR). Pull requests tells other contributors about changes that you wish to make to the product. Once a PR is opened, contributors can discuss and review the potential changes with the community and can add follow-up commits before the changes are merged into the product source code. After the change is merged, the PR will be closed.

We define Gross Product Pull Requests (GPPR) as the number of completed Pull Requests in month . To add weight on more recent PRs, we use a weighted measure to return the number of months that a PR  took to close (i.e., PR1). Thus, PRs taking more than a month to complete has less weighting. GPPR is formally defined as:

It is important to note that we do not distinguish the sizes nor difficulties of Pull Requests.

Iii Method

Our goal is to determine whether or not our defined measures provide meaningful and interesting insights into the relationship between the OSS Health (WF) and Wealth (GPPR). We use the following two research questions as a guide into our study:

  • : Does Wealth influence how OSS communities maintain their Health?

  • : Does Health influence how OSS communities maintain their Wealth?

To answer both research questions, we conducted an empirical study to measure the Health and Wealth of several OSS projects. Similar to Tom Carden’s Health and Wealth of Nations, we apply our defined metrics of Health and Wealth. We use the case study approach to carefully select candidate projects that depict different patterns of Health and Wealth over time. To answer , we analyzed and compared WF in terms of the labor of both novice and experienced contributors in OSS communities. Then finally, to answer , we investigated the changes of GPPR and corresponding WF changes.

Iv Studied Projects and Their Community Activities Over Time

Iv-a Health and Wealth Over Time

Fig. 1: The Figure shows the and of 90 OSS project. from October 2010 to December 2012. In the Figure, we highlight our selected case studies: homebrew (blue), bitcoin (red), and d3 (green). An interactive demo is available at {goo.gl/Ig6NTR}.
Project Life Span # of Contributors # of Commits # of Pull Requests # of Issues
d3 6 years and 8 months 119 4,092 1,054 1,901
bitcoin 7 years and 9 months 444 13,976 7,305 3,148
homebrew 8 years and 1 month 5,621 63,881 33,606 17,046
TABLE I: Summary Statistics of the Selected Case Studies (snapshot as of June 2017)

Fig. 2: The Figure shows the median values of from October 2010 to December 2012 for our three case studies.
2011/01 = 36 2011/05 = 3 2011/11 = 3 2012/07 = 5

(a) These point diagrams show the workforce () and experience (e) for contributors in d3.

2011/01 = 4 2011/07 = 7 2012/01 = 18 2012/08 = 17

(b) These point diagrams show the workforce () and experience (e) for contributors in bitcoin

2011/01 = 2 2011/07 = 3 2012/01 = 3 2012/07 = 3

(c) These point diagrams show the workforce () and experience (e) for contributors in homebrew

Fig. 3: The Point diagram of and activity periods of all contributors. Horizontal line shows median of .

Fig. 4: The from October 2010 to December 2012 in for our three case studies. Note that the size each point is indicative of the total number of contributors for that project.

For our empirical study, we first collected and analyzed 90 OSS projects provided by the GHTorrent [15]. From this dataset, we were able to select and gather two of the related contribution activities (i.e., commits and pull requests) for the WF and GPPR metric calculations.

Figure 1 shows the Health and Wealth of all projects tracked from October 2010 to December 2012. We highlight three projects that depict different patterns in the relation of Health and Wealth.

From the figure, we were able to identify three types of OSS evolution of the Health and Wealth:

  • Consistent Wealth but changes in healthiness: These OSS projects show consistent Wealth, however, these projects experience changes in its Health.

  • Changes in both Health and Wealth rates: These projects experience changes in both its Health and Wealth over time.

  • Changes in wealthiness while keeping consistent Health: These OSS projects depict Health at a consistent rate, however, the projects increase their Wealth.

Iv-B Case Study Selections

Figure 1 shows three OSS projects that depict three distinct patterns between the Health and Wealth of an OSS project over time. Table I shows the detailed information of each of these projects.

  • d3 (green color) 333d3: https://github.com/d3/d3 is a JavaScript library for visualizing data using web standards. As shown in Table I this project is the youngest and smallest of the three selected projects, having the smallest contributors, commits, pull requests and issues. We can see that d3 had experienced changes in WF Health, while keeping a low but consistent GPPR Wealth.

  • bitcoin (red color)444bitcoin: https://github.com/bitcoin/bitcoin is software that enables the use of currency referred to as bitcoin. As shown Table I, this project is the middle of the three selected projects as for all items. Out of the three selected projects, we find that bitcoin exibits the most number of commits per one contributor. We can see that bitcoin had experienced changes in both GPPR wealth and WF Health.

  • homebrew (blue color) 555homebrew: https://github.com/Homebrew/brew is a software package management system that simplifies the installation of software on Apple’s Mac OS operating system. As shown Table I, this project is the oldest and has the biggest statistics of the three selected project (i.e., has the most community of contributors, commits, pull requests and issues). We can that homebrew had experienced increase in GPPR wealth, while keeping a consistently low WF Health.

It is difficult to judge whether or not these selected projects are successful. However, considering that since these projects have been active for 6 to 8 years, we assume that they are representatives of typical OSS projects and their communities.

V Results

: Does Wealth influence how OSS communities maintain their Health?

Figure 3 shows the median WF for our three case studies over time. For a deeper analysis of Health in terms of WF and experience levels of contributors, Figure 3 depicts four snapshots of WF along with contributor experiences. From these Figures, we are able to make three observations:

“Less wealthy projects may require additional efforts (i.e., higher WF) from their more experienced contributors.”

Figure 3 shows that overall, less Wealth projects (green) occasionally experience bursts (i.e., depicted by the spikes in WF ) from its community workforce. Furthermore, we find in Figure 3(a) and Figure 3(b) that d3 project experienced more WF from their more experienced contributors. We conjecture that less wealthy projects (i.e., d3 and bitcoin) occasionally require more WF from their experienced contributors.

“Wealthy projects attract and rely on the inexperienced casual workforce.”

Figure 3 shows that homebrew keeps a consistent low WFȮne explanation, for the low WF  may be accounted by the high number of contributors of less experienced (i.e., casual contributors) in the community. We conjecture that the key to homebrew’s Wealth is the ability to attract and keep these casual contributors.

“Experienced contributors are major workforce of an OSS project.”

Generally, from Figure 3, we observe that the highest rates of WF is from the more experienced contributors. This results confirms the common notions that experienced contributors are indeed major workforce of an OSS project.

In summary, to answer , we find that wealthy projects may not necessarily depend only on higher workforce, as they can rely on many contributions from their casual contributors. Consequently, less wealthy project may require additional efforts from their more experienced workforce.

: Does Health influence how OSS communities maintain their Wealth?

Figure 4 shows the GPPR for our three case studies over time. Using this Figure, we make the following observation:

“Wealthy projects do not depend only on higher workforce.”

In Figure 4, we can clearly observe the difference in GPPRs between the wealthiest project (i.e., homebrew) and less wealthy projects (i.e., bitcoin and d3).

It is interesting to note that even though homebrew experiences a constant Wealth of pull requests. Under deeper manual analysis, we found that homebrew contributors not to be as effective, with many contributors ignoring a significant amount of pull requests. Therefore, corresponding with , we conclude that such projects rely the size of its casual contributors to maintain its Wealth.

On the other hand, less wealthy projects remain with a lower GPPRs, requiring more Health WF from its contributors. However, there exists cases when a project may increase its Wealth. For example, bitcoin occasionally increased its GPPRs. A manual analysis revealed that a single contributor had submitted all 58 PR in a short period, thus increasing the project’s Wealth. In fact, their contribution accounts for about one-third the total GPPR (177 PR) during this period. We conjecture that such experienced workforce does have an influence on Wealth.

In summary, to answer , we find that less wealthy projects rely mainly on their active workforce to maintain their Wealth or increase their Wealth.

Vi Related Work

Ye et al examined the structure of Free and Open Source Software (F/OSS) communities and the co-evolution of F/OSS systems and communities. They report F/OSS systems and communities generally co-evolve, they co-evolve differently depending on the goal of the system and the structure of the community[16]. Our study also mentions product evolution and community activities through the analyzing the Health and Wealth in OSS projects. This study can help to understand the co-evolution of OSS systems and communities.

Gousios et al. explored how pull-based software development works in OSS projects[17, 18, 19]. They find that the pull request model offers fast turnaround, increased opportunities for community engagement and decreased time to incorporate contributions. Also, our study presents a measure which is by number of Pull Requests as GPPR. This measure is very useful to understand whether a project can takes advantage of pull requests.

Zhou et al. studied long-term contributors (LTC), analyzing the behavior of individual participants in Gnome and Mozilla [20]. They report that future LTCs tend to be more active and show more community-oriented attitudes than do other joiners during their first month. Also, Pinto et al. analyzed about activities of casual contributors[21]. Casual contributors are that developers do not want to become active members. They describe casual contributors that foster diversity and collaboration. Our study presents a measure which is contributors’ activities as WF. We think this measure is very important to clarify whether a project has experienced contributors and casual contributors.

Vii Conclusions

Economist consider that Wealth and Health is very important to clarify the condition, features for future of a country. In this study, we propose the Health and Wealth in the context of OSS projects. We focus on the number of submitted and closed pull requests and experiences of contributors activity, and define two measures of Workforce (WF) and Gross Product Pull Requests (GPPR).

From these measures, we identified three situations of OSS evolution of the Health and Wealth. First, we analyzed projects that have changes in healthiness while keeping consistent Wealth. Second, we then studied changes in wealthiness while keeping consistent Health. Finally, we studied the changes in both Health and Wealth rates. From this analysis, we find that wealthy projects attract and rely on the inexperienced casual workforce, while less wealthy projects may require additional efforts from their more experienced contributors.

Our future work includes a more in depth study with metrics adapted from other fields, and bigger dataset to clarify the relationship between the Wealth and Health of OSS projects, and the Wealth as the economic and the Health as a demography.

References

  • [1] K. Nakakoji, Y. Yamamoto, Y. Nishinaka, K. Kishida, and Y. Ye, “Evolution patterns of open-source software systems and communities,” in In Proc. of the Int. Workshop on Principles of Software Evolution, IWPSE’ 2002, 2002, pp. 76–85.
  • [2] Y. Ye and K. Kishida, “Toward an understanding of the motivation of open source software developers,” In Proc. 25th Int. Conf. on Softw. Eng., ICSE’ 2003., pp. 419–429, 2003.
  • [3] M. Aberdour, “Achieving quality in open source software,” IEEE softw., no. September, pp. 58–64, 2007.
  • [4] K. Crowston and J. Howison, “Assesing the Health of Open Source Communities,” IEEE Computer, vol. 39, no. 5, pp. 89–91, 2006.
  • [5] A. Senyard and M. Michlmayr, “How to Have a Successful Free Software Project,” in 11th Asia-Pacific Software Engineering Conference, APSEC 2004, 2004, pp. 1–8.
  • [6] I. Steinmacher, T. Conte, M. A. Gerosa, and D. Redmiles, “Social barriers faced by newcomers placing their first contribution in open source software projects,” pp. 1379–1392, 2015.
  • [7] T. Piketty, Capital in the Twenty-First Century.   Ã‰ditions du Seuil, Belknap Press, 2013, pp. 43, 385–390.
  • [8] C. Bird, D. Pattison, R. D. Souza, V. Filkov, and P. Devanbu, “Latent Social Structure in Open Source Projects Categories and Subject Descriptors,” In Proc. of the 16th ACM SIGSOFT Int. Symposium on Foundations of Softw. Eng., pp. 24–35, 2008.
  • [9] C. Bird, “Sociotechnical coordination and collaboration in open source software,” IEEE Int. Conf. on Softw. Maintenance, ICSM 2011, pp. 568–573, 2011.
  • [10] R. P. L. Buse and T. Zimmermann, “Information needs for software development analytics,” in In Proc. of the 34th Int. Conf. on Softw. Eng., ICSE 2012, 2012, pp. 987–996.
  • [11] S. Onoue, H. Hata, and K. Matsumoto, “A Study of the Characteristics of Developers’ Activities in GitHub,” in In Proc. of 5th Int. Works. on Empirical Softw. Eng. in Practice IWESEP’ 2013, 2013, pp. 7–12.
  • [12] S. Onoue, H. Hata, and k. Matsumoto, “Software Population Pyramids: The Current and the Future of OSS Development Communities,” pp. 34:1–34:4, 2014.
  • [13] S. Onoue, H. Hata, A. Monden, and K. Matsumoto, “Investigating and projecting population structures in open source software projects: A case study of projects in GitHub,” IEICE Transactions on Information and Systems, vol. E99D, no. 5, pp. 1304–1315, 2016.
  • [14] P. C. Rigby, D. M. German, L. Cowen, and M.-A. Storey, “Peer Review on Open-Source Software Projects,” ACM Transactions on Softw. Eng. and Methodology, 2014.
  • [15] G. Gousios, “The GHTorent dataset and tool suite,” IEEE Int. Working Conf. on Mining Softw. Repositories, MSR 2013, pp. 233–236, 2013.
  • [16] Y. Y. Yunwen Ye, Kumiyo Nakakoji and K. Kishida, “The co-evolution of systems and communities in free and open source software development,” Free/Open Source Software Development, pp. 59–82, 7 2004.
  • [17] G. Gousios, M. Pinzger, and A. V. Deursen, “An exploratory study of the pull-based software development model,” In Proc. of the 36th Int. Conf. on Softw. Eng., ICSE 2014, pp. 345–355, 2014.
  • [18] G. Gousios, A. Zaidman, M. A. Storey, and A. Van Deursen, “Work practices and challenges in pull-based development: The integrator’s perspective,” In Proc. of the 37th Int. Conf. on Softw. Eng., ICSE 2015, vol. 1, pp. 358–368, 2015.
  • [19] G. Georgios and A. Bacchelli, “Work practices and challenges in pull-based development: the contributor’s perspective,” In Proc. of the 38th Int. Conf. on Softw. Eng., ICSE 2016, pp. 285–296, 2016.
  • [20] M. Zhou and A. Mockus, “What make long term contributors: Willingness and opportunity in oss community,” pp. 518–528, 2012.
  • [21] G. Pinto, I. Steinmacher, and M. A. Gerosa, “More Common Than You Think: An In-depth Study of Casual Contributors,” 2016 IEEE 23rd Int. Conf. on Softw. Analysis, Evolution, and Reengineering, pp. 112–123, 2016.