We Don't Need Another Hero? The Impact of "Heroes" on Software Development

10/25/2017 ∙ by Amritanshu Agrawal, et al. ∙ ibm 0

A software project has "Hero Developers" when 80 delivered by 20 heroes bad for software quality? Is it better to have more/less heroes for different kinds of projects? To answer these questions, we studied 661 open source projects from Public open source software (OSS) Github and 171 projects from an Enterprise Github. We find that hero projects are very common. In fact, as projects grow in size, nearly all project become hero projects. These findings motivated us to look more closely at the effects of heroes on software development. Analysis shows that the frequency to close issues and bugs are not significantly affected by the presence of project type (Public or Enterprise). Similarly, the time needed to resolve an issue/bug/enhancement is not affected by heroes or project type. This is a surprising result since, before looking at the data, we expected that increasing heroes on a project will slow down howfast that project reacts to change. However, we do find a statistically significant association between heroes, project types, and enhancement resolution rates. Heroes do not affect enhancement resolution rates in Public projects. However, in Enterprise projects, the more heroes increase the rate at which project complete enhancements. In summary, our empirical results call for a revision of a long-held truism in software engineering. Software heroes are far more common and valuable than suggested by the literature, particularly for medium to large Enterprise developments. Organizations should reflect on better ways to find and retain more of these heroes



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Many projects are initiated by a project leader who stays in that project for the longest duration (Ye and Kishida, 2003). These leaders are the ones who moderate the projects, contributes the most, and stays the most active throughout the software development life cycle. Such developers are sometimes called Hero/ Core/ lone contributors (Martínez-Torres and Diaz-Fernandez, 2014). In the literature (Goeminne and Mens, 2011; Torres et al., 2011; Robles et al., 2009; Yamashita et al., 2015), it is usual to define a hero project as one where 80% of the contributions are made by 20% of the developers.

In the literature, it is usual to deprecate heroes (Bier et al., 2011; Morcov, 2012; Hislop et al., 2002; Boehm, 2006; Wood-Harper and Wood, 2005) since they can become a bottleneck that slows down project development. That said, looking through the literature, we cannot see any large scale studies on the effect of heroes in Enterprise projects. Accordingly, to better understand the positive or negative impact of heroes in software development, we mined 661 Public open source software (OSS) projects and 171 Enterprise Github projects (we say that enterprise software are in-house proprietary projects that used public Github Enterprise repositories to manage their development). After applying statistical tests to this data, we found some surprises:

  • [leftmargin=0.4cm]

  • Hero projects are exceedingly common in both Public and Enterprise projects, and the ratio of hero programmers in a project does not affect the development process, at least for the metrics we looked, with two exceptions;

  • Exception #1: in larger projects, heroes are far more common, that is, large projects need their heroes;

  • Exception #2: heroes have a positive impact on Enterprise projects, specifically, the more heroes, the faster the enhancement resolution rates to those kinds of projects.

This was surprising since, before mining the data, our expectation was that heroes have a large negative effect on software development, particularly for Public projects where the work is meant to be spread around a large community.

The rest of this paper explains how we made and justified these findings. This investigation is structured around the following research questions:

  • [leftmargin=0.4cm]

  • RQ1: How common are heroes?

    From this analysis, we found:

Result 1 ()

Over 77% projects exhibit the pattern that 20% of the total contributors complete 80% of the contributions. This holds true for both Public and Enterprise projects .

  • [leftmargin=0.4cm]

  • RQ2: How does team size affect the prevalence of hero projects?

    After dividing teams into small, medium and large sizes, we found that:

Result 2 ()

As team size increased, an increasing proportion of projects become hero projects. This is true for both Public and Enterprise projects.

  • [leftmargin=0.4cm]

  • RQ3: Are hero projects associated with better software quality?

    We extracted 6 quality measures, namely number of issues, bugs and enhancements being resolved, and the time taken to resolve these issues, bugs and enhancements.

    • [leftmargin=0.4cm]

    • a: Does having a hero programmer improves the number of issues, bugs and enhancements being resolved?

    Result 3 ()

    For both Public and Enterprise projects, there is no statistical difference between the percent of issues and bugs being resolved within hero and non-hero projects. However, for enhancement issues, Enterprise/Pubic hero projects closed statistically more/less issues (respectively).

    • [leftmargin=0.4cm]

    • b: Does having a hero programmer improves the time to resolve issues, bugs and enhancements?

    Result 4 ()

    There was no statistically difference found in the resolution times of issues, bugs and enhancements among non-hero and hero projects in either cases (Public or Enterprise).

Based on the above, we say that our empirical results call for a revision of a long-held truism in software engineering. Software heroes are far more common and valuable than suggested by the literature, particularly for medium to large Enterprise developments. Organizations should reflect on better ways to find and retain more of these heroes.

The rest of this paper is structured as follows. Following this introduction, Section 2 gives a literature review regarding Hero programmers in OSS then Section 3 describes the data extraction process and the experimentation details. The research questions are answered in Section 4 and the implications of these results are discussed in Section 5. Finally, we discuss the validity and conclusion of our results.

2. Background and Related Work

2.1. Project Roles

Following on from Ye and Martinez et al. (Ye and Kishida, 2003; Martínez-Torres and Diaz-Fernandez, 2014), we say that there are many developer roles within a Public or Enterprise software project:

  • [leftmargin=0.4cm]

  • Project leaders, who initiate a project;

  • Core members, who work on the project and make many contributions over an extended time periods;

  • Active developers, who contributes regularly for new enhancements and bug fixes;

  • Peripheral developers, who occasionally contributes to new enhancement;

  • Bug fixers;

  • Bug reporters;

  • Bug readers;

  • Passive users.

Of the above, Core developers can be project leaders or core members. Core developers are the few central developers who implement most of the code changes and make important project direction decisions, whereas the other peripheral developers being the “many eyes” of the project that make small changes such as bug fixes (Mockus et al., 2002; Tsay et al., 2014)

Core developers are said to contribute roughly 80% of the code who are just about 20% of their project team size (Goeminne and Mens, 2011; Torres et al., 2011; Robles et al., 2009; Yamashita et al., 2015). These contributions can be recorded in terms of how many commits they made or how many lines of code (loc) they changed. Research studies (Krishnamurthy, 2002; Peterson, 2013) suggested that most work/contributions are done by lone developer. A core committer is also the one who has write access to a project’s repository (Padhye et al., 2014). These developers are also called Hero Programmers.

Pinto et al. (Pinto et al., 2016) studied 275 OSS projects and found that about 48% of the developers population committed only 1.73% of the total number of commits (which we are calling peripheral developers). Even in these contributions, about 28.6% contributions are done simply to fix typos, grammar and issues, 30.2% tried fixing bugs, 8.9% contributions were to refactor code and while only 18.7% was used to contribute for new features. Yamashita et al. (Yamashita et al., 2015) also found different proportions of contribution activity among the core and peripheral developers.

Since the work in projects is not evenly divided, this motivates our research on the overall effects on the projects of different levels of contributions by different developers.

2.2. Related Work

To the best of our knowledge, the research of this paper is the largest study on the effects of heroes in Public and Enterprise projects. The rest of this section describes some of the other related work we have found in this area but it should be noted that none of the following studies (a) explore as many projects as we do and (b) compare effects across Public and Enterprise projects.

The benefits and drawbacks of heroes are widely discussed in the literature. Bach (Bach, 1995) notes that such heroes are enlisted to (e.g.,) speed the delivery of late projects (Cullom and Cullom, 2006). On the other hand, hero-based projects have their drawbacks. In hero projects, there is less collaboration between team members since there are few active team members. Such collaborations can be highly beneficial. Studies that analyzed the distributed software development on social coding platforms like Github and Bitbucket (Dias et al., 2016; Cosentino et al., 2017) commented on how social collaborations can reduce the cost and efforts of software development without degrading the quality of software.

Distributed coding effort gives rise to agile community-based programming practices which can in turn have higher customer satisfaction, lower defect rates, and faster development times (Moniruzzaman and Hossain, 2013; Rastogi et al., 2017). Such practices can lead to increased customer satisfaction when faster development leads to:

  • [leftmargin=0.4cm]

  • Lowering the issues/bugs/enhancements resolution times (Mockus et al., 2002; Jarczyk et al., 2014; Bissyandé et al., 2013; Athanasiou et al., 2014; Gupta et al., 2014; Reyes López, 2017);

  • Increasing the number of issues/bugs/enhancements being resolved (Jarczyk et al., 2014).

More specifically, as to issues related to heroes, Bier et al. (Bier et al., 2011) warn that as project become more and more complex, teams should be communities of experts specialized in niche domains rather than being lead by “cowboy programmers” (a.k.a. heroes) (Morcov, 2012). Such hero programmers are often associated with certain process anti-patterns such as poorly documented systems (when heroes generate code more than documents about that code (Hislop et al., 2002)) or all-night hackathons to hastily patch faulty code to meet deadlines, thus introducing more bugs into the system and decreasing the number of people who understand the whole system (Boehm, 2006). Also, Wood et al. (Wood-Harper and Wood, 2005) caution that heroes are often code-focused but software development needs workers acting as more than just coders (testers, documentation authors, user-experience analysts).

Our summary of the above is as follows: with only isolated exceptions, most of the literature deprecates heroes even though the value (or otherwise) of heroes in Enterprise software developments has rarely been investigated. Accordingly, in this paper, we compare and contrast the effects of heroes in Public and Enterprise development.

3. Data and Experimentation

3.1. Data

To perform our experiments we used OSS projects from public and Enterprise Github. Of the publicly available projects hosted on public Github, a selected set of projects are marked as “showcases”, to demonstrate how a project can be developed in certain domain such as game development, and music (Github, 2017). By selecting these Github projects we can ensure we are using an interesting and representative set of open source projects. Examples of popular projects included in the Github showcases that we used for our analysis are: Javascript libraries such as ‘AngularJS’111https://github.com/angular/angular.js and ‘npm’222https://github.com/npm/npm, and programming languages such as ‘Go’333https://github.com/golang/go, ‘Rust’444https://github.com/rust-lang/rust, and ‘Scala’555https://github.com/scala/scala.

Not all projects hosted on Github are good for the analysis. Studies done by (Kalliamvakou et al., 2014; Bird et al., 2009; Munaiah et al., 2017) advice that researchers should filter out the projects which will not be suitable for analysis. Such unsuitable projects might record only minimal development activity, are used only for personal purposes, and not even be related to software development at all. Accordingly, we apply the following filtering rules.

We started off with 1,108 Public and 538 Enterprise Github projects. Following the advice of others (Kalliamvakou et al., 2014) (Bird et al., 2009), we pruned as follows:

  • [leftmargin=0.4cm]

  • Collaboration: Number of pulls requests are indicative of how many other peripheral developers work on this project. Hence, a project must have at least one pull request.

  • Commits: The project must contain more than 20 commits.

  • Duration: The project must contain software development activity of at least 50 weeks.

  • Issues: The project must contain more than 10 issues.

  • Personal Purpose: The project must not be used and maintained by one person. The project must have at least eight contributors.

  • Releases: The project must have at least one release.

  • Software Development: The project must only be a placeholder for software development source code.

Sanity check Discarded project count
Enterprise Public
No. of Commits 68 96
No. of Issues 60 89
Personal purpose (# programmers ) 47 67
SW development only 9 51
Duration weeks 12 46
No. of Releases 136 44
Collaboration (# Pull requests ) 35 54
Projects left after filtering 171 661
Table 1. Filtering criteria. Starting with 1108+538 public+enterprise projects, we discard projects that fail any of the LHS tests to arrive at 661+171 projects.

After applying these criteria we obtained 661 open source and 171 proprietary projects. We report how many of the projects passed each sanity check in Table 1. The projects are discarded when the steps given in Table 1 are applied sequentially, from top to bottom, we are left with 661 open-source and 171 proprietary projects. We used the Github API to extract necessary information from these projects and tested each criteria stated above. Upon completion, we obtained a list of projects from which we extract metrics to answer our research questions. We repeated the procedure for both our Public and Enterprise Github data sources.

3.2. Metric Extraction

To answer our research questions, we extracted the number of commits made by individual developers, and if the number of commits made by 20% of developers is more than 80% of the commits, they are classified as Hero Projects and all the others were classified into Non-Hero projects (these thresholds were selected based on the advice of Yamashita et al 

(Yamashita et al., 2015)).

Note that Github allows you to merge the pull requests from external developers and when merged, these contributions gets included in the merger contributor as well. These merges could introduce more contributions to the Hero Developer so to over-inflate the “Hero effect”, hence, we did not include those pull merge requests.

Figure 1. Distribution of Hero and Non Hero projects in Public and Enterprise projects. Note that these hero projects are very common.

We next divided each project based on the team size. After applying the advice of Gautam et al. (Gautam et al., 2017), we use 3 team sizes:

  • [leftmargin=0.4cm]

  • Small teams: number of developers ¿ 8 but less than 15

  • Medium teams: number of developers ¿ 15 but less than 30;

  • Large teams: number of developers ¿ 30 .

We then defined 6 metrics, namely,

(4) Median time taken to resolve issues
(5) Median time taken to resolve Bug tagged issues
(6) Median time taken to resolve Enhanced tagged issues

3.3. Statistical Tests

When comparing the results between Hero and Non-hero, we used a statistical significance test and an effect size test. Significance test are useful for detecting if two populations differ merely by random noise. Also, effect sizes are useful for checking that two populations differ by more than just a trivial amount.

For the significance test, we use the Scott-Knott procedure recommended at TSE’13 (Mittas and Angelis, 2013) and ICSE’15 (Ghotra et al., 2015). This technique recursively bi-clusters a sorted set of numbers. If any two clusters are statistically indistinguishable, Scott-Knott reports them both as one group. Scott-Knott first looks for a break in the sequence that maximizes the expected values in the difference in the means before and after the break. More specifically, it splits values into sub-lists and in order to maximize the expected value of differences in the observed performances before and after divisions. For e.g., lists and of size and where , Scott-Knott divides the sequence at the break that maximizes:

Scott-Knott then applies some statistical hypothesis test

to check if and are significantly different. If so, Scott-Knott then recurses on each division. For this study, our hypothesis test was a conjunction of the A12 effect size test (endorsed by (Arcuri and Briand, 2011)) and non-parametric bootstrap sampling (Efron and Tibshirani, 1994), i.e., our Scott-Knott divided the data if both bootstrapping and an effect size test agreed that the division was statistically significant (90% confidence) and not a “small” effect ().

4. Results

4.1. RQ1: How common are heroes?

Recall that we define a project to be heroic when 80% of the contributions are done by about 20% of the developers (Yamashita et al., 2015). To assess the prevalence of such projects, we extracted the above features and classified these projects into hero and non-hero.

As shown in Figure 1, 77% and 78% projects are driven by hero or core developers in Public and Enterprise projects respectively. This trend was also observed by Pinto et al. (Pinto et al., 2016).

Why so many heroes? One explanation is that our results may be incorrect and they are merely a result of the “build effect” reported by Kocaguneli et al. (Kocaguneli et al., 2013). In their work with Microsoft code files, Kocaguneli et al. initially found an effect that seems similar to heroes. Specifically, in their sample, most of the files were most often updated by a very small number of developers. It turns out that those “heroes” were in fact, build engineers who had the low-level, almost clerical task of running the build scripts and committed the auto-generated files. If our results were conflated in the same say then all the results of this paper would be misleading.

We say that our results do not suffer from Kocaguneli build effect, for two reasons:

  • [leftmargin=0.4cm]

  • Kocaguneli reported an extremely small number of build engineers (dozens, out of a total population of thousands of engineers). The heroes found in this study are far more frequent than that.

  • As mentioned before, we did remove any pull merge requests from the commits to remove any extra contributions added to the hero programmer. This means that the contributions aggregated by many developers would not contribute to a few build engineers in our sample.

Figure 2. Public projects: Hero and Non Hero projects for different team sizes. The percentages shown within the histogram bars show that, as team size grows, the ratio of hero project increases.
Figure 3. Enterprise projects: Hero and Non Hero projects for different team sizes. As before, when the team size grows, hero projects dominate our sample.

If the build effect does not explain these results, what does? We think the high frequency of heroes can be explained by the nature of software development. For example, consider Github OSS projects, they are often started by a project leader (Ye and Kishida, 2003) who is responsible for maintaining and moderating that project. Until the project becomes popular only the leader would be responsible to make major code contributions (Tsay et al., 2014). Once the project has become stable and popular, the on-going issues/bugs/enhancement fixes are just few lines of code done by peripheral developers (Pinto et al., 2016). Note that such a track record would naturally lead to heroes.

Whatever the reason, the pattern is very clear. The ratio of hero projects in Figure 1 is so large that it motivates the rest of this paper. Accordingly, we move in to study the impact of heroes on software quality.

4.2. RQ2: How does team size affect the prevalence of hero projects?

Figure 2 and 3 show the distribution of Hero and non-hero projects across different team sizes in Public and Enterprise respectively. The clear pattern in those results is that as teams grow larger, they are more dependent on heroes. In fact, for large projects, non-heroes almost disappear.

That is, contrary to established wisdom in the field (Bier et al., 2011), what we see here is most projects make extensive use of heroes. We conjecture that the benefits of having heroes, where a small group handles the complex communications seen in large projects, out-weighs the theoretical drawbacks of heroes.

Figure 4. Public projects: Hero and Non-hero values of , and (which is the ratio of issue, bug, enhancement reports being closed over the total issue, bug, enhancement reports created, respectively). Of these distributions, only the enhancement rates are different between hero and non-hero projects.
Figure 5. Enterprise projects: Hero and Non-hero values of , and (which is the ratio of issue, bug, enhancement reports being closed over the total issue, bug, enhancement reports created, respectively). As before, only the enhancement rates are different between hero and non-hero projects.

4.3. RQ3: Are hero projects associated with better software quality ?

We divide this investigation into two steps: RQ3a and RQ3b. RQ3a explores the ratio of issues/bugs/enhancements successfully closed. Next, RQ3b explores the time required to close those issues.

4.3.1. RQ3a: Does having a hero programmer improves the number of issues, bugs and enhancements being resolved?

Figure 4 and 5 show boxplots of each of metrics reporting the ratio of closed issue, bugs, and enhancements denoted by , and respectively. Note that larger numbers are better.

In these figures, the x-axis separates our Hero and Non-hero projects (found using the methods of RQ1). On the x-axis, each label is further labelled with “Rk:1” or “Rk:2” which is the result of a statistical comparison of the two populations using the Scott-knott test explained in Section 3.3. Note that in Figure 4 and 5, for the issue and bug closed ratios, the two distributions have the same rank, i.e., “Rk:1”. This means that these populations are statistically indistinguishable.

On the other hand, the ratio of closing enhancement issues in Public and Enterprise projects is statistically distinguishable, as shown by the “Rk:1” and “Rk:2” labels on those plots. Interestingly, the direction of change is different in Public and Enterprise projects:

  • [leftmargin=0.4cm]

  • In Public projects, heroes close the fewest enhancement issues;

  • But in Enterprise projects, heroes close the most enhancement issues;

  • Further, in Enterprise projects, the variance in the percentage of closed enhancements is much smaller with heroes than otherwise. That is, heroes in Enterprise development result in more control of that project.

Hence, while we should depreciate hero projects for open source projects, we should encourage them for Enterprise projects. Note that this is very much the opposite of conventional wisdom (Bier et al., 2011). That said, our reading of the literature is that heroes have been studied much more in OSS projects than in proprietary Enterprise projects. Hence, this finding (that proprietary Enterprise projects benefit from heroes) might have existed undetected for some time.

Figure 6. Public projects: Hero and Non-hero values of , and (which is the median time taken to resolve issue, bugs, enhancement reports respectively). Y-axis shown is in hours.
Figure 7. Enterprise projects: Hero and Non-hero values of , and (which is the median time taken to resolve issue, bugs, enhancement reports respectively). Y-axis shown is in hours.

4.3.2. RQ3b: Does having a hero programmer improves the time to resolve issues, bugs and enhancements?

Figure 6 and 7 show boxplots of reporting the time required to close issues, bugs, and enhancements denoted by , and respectively. Note that for these figures, smaller numbers are better.

Like before, the x-labels are marked with the results of a statistical comparison of these pairs of distributions. Note that all these statistical ranks are “Rk:1”, i.e., all these pairs of distributions are statistically indistinguishable. That is, there is no effect to report here about effect of heroes or non-heroes on the time required to close issues, bugs and enhancements.

5. Discussion

What’s old is new. Our results (that heroes are important) echo a decades old concept. In 1975, Fred Brooks wrote of “surgical teams” and the “chief programmer” (Brooks Jr, 1975). He argued that:

  • [leftmargin=0.4cm]

  • Much as a surgical team during surgery is led by one surgeon performing the most critical work, while directing the team to assist with less critical parts,

  • Similarly, software projects should be led by one “chief programmer” to develop critical system components while the rest of a team provides what is needed at the right time.

Brooks conjecture that “good” programmers are generally five to ten times as productive as mediocre ones. We note that our definition of “hereos” (80% of the work done by 20% of the developers) is consistent with the Brooks’s conjecture that heroes are five times more productive than the other team members.

Prior to this research, we had thought that in the era of open source and agile, all such notions of “chief programmers” and “heroes” were historical relics, and that development teams would now be distributing the workload across the whole project.

But based on the results of this paper, we have a different view. Projects are written by people of various levels of skills. Some of those people are so skilled that they become the project heroes. Organizations need to acknowledge their dependency on such heroes, perhaps altering their human resource policies. Specifically, organizations need to recruit and retain more heroes (perhaps by offering heroes larger annual bonuses).

6. Threats to Validity

As with any large scale empirical study, biases can affect the final results. Therefore, any conclusions made from this work must be considered with the following issues in mind:

  • [leftmargin=0.4cm]

  • Internal Validity

    • [leftmargin=0.4cm]

    • Sampling Bias: Our conclusions are based on the 1,108+538 Public+Enterprise Github projects that started this analysis. It is possible that different initial projects would have lead to different conclusions. That said, our initial sample is very large so we have some confidence that this sample represents an interesting range of projects. As evidence of that, we note that our sampling bias is less pronounced than other Github studies since we explored both Public and Enterprise projects (and many prior studies only explored Public projects.

    • Evaluation Bias

      : In RQ3b, we said that there is no difference between heroes or non-heroes on the time required to close issues, bugs and enhancements. While that statement is true, that conclusion is scoped by the evaluation metrics we used to write this paper. It is possible that, using other measurements, there may well be a difference in these different kinds of projects. This is a matter that needs to be explored in future research.

  • Construct Validity: At various places in this report, we made engineering decisions about (e.g.) team size and what constitutes a “hero” project. While those decisions were made using advice from the literature (e.g. (Gautam et al., 2017)), we acknowledge that other constructs might lead to different conclusions.

  • External Validity: We have relied on issues marked as a ‘bug’ or ‘enhancement’ to count bugs or enhancements, and bug or enhancement resolution times. In Github, a bug or enhancement might not be marked in an issue but in commits. There is also a possibility that the team of that project might be using different tag identifiers for bugs and enhancements. To reduce the impact of this problem, we did take precautionary step to (e.g.,) include various tag identifiers from Cabot et al. (Cabot et al., 2015). We also took precaution to remove any pull merge requests from the commits to remove any extra contributions added to the hero programmer.

  • Statistical Validity: To increase the validity of our results, we applied two statistical tests, bootstrap and the a12. Hence, anytime in this paper we reported that “X was different from Y” then that report was based on both an effect size and a statistical significance test.

7. Conclusion

The established wisdom in the literature is to depreciate “heroes”, i.e., a small percentage of the staff responsible for most of the progress on a project. After mining 661 Public and 171 Enterprise Github projects, we assert that it is time to revise that wisdom:

  • [leftmargin=0.4cm]

  • Overwhelmingly, most projects are hero projects, particularly when we look at medium to large projects. That is, discussions about the merits of avoiding heroes is really relevant only to smaller projects.

  • Heroes do not significantly affect the rate at which issues or bugs are closed.

  • Nor do they influence the time required to address issues, bugs or enhancements.

  • Heroes positively influence the rate at which enhancement requests are managed within Enterprise project.

The only place where our results agree with established wisdom is for the enhancement rates for non-hero Public projects. In this particular case, we saw that non-hero projects are enhanced fastest. That said, given the first point listed above, that benefit for non-hero projects is very rare.

In summary, our empirical results call for a revision of a long-held truism in software engineering. Software heroes are far more common and valuable than suggested by the literature, particularly for medium to large Enterprise developments. Organizations should reflect on better ways to find and retain more of these heroes.

8. Acknowledgements

The first and second authors conducted this research study as part of their internship at the industry in Summer, 2017. We also express our gratitude to our industrial partner for providing us the opportunity to mine hundreds of their Enterprise projects. Also, special thanks to our colleagues and mentors at the industry for their valuable feedback.


  • (1)
  • Arcuri and Briand (2011) Andrea Arcuri and Lionel Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 1–10.
  • Athanasiou et al. (2014) Dimitrios Athanasiou, Ariadi Nugroho, Joost Visser, and Andy Zaidman. 2014. Test code quality and its relation to issue handling performance. IEEE Transactions on Software Engineering 40, 11 (2014), 1100–1125.
  • Bach (1995) James Bach. 1995. Enough about process: what we need are heroes. IEEE Software 12, 2 (1995), 96–98.
  • Bier et al. (2011) Norman Bier, Marsha Lovett, and Robert Seacord. 2011. An online learning approach to information systems security education. In Proceedings of the 15th Colloquium for Information Systems Security Education.
  • Bird et al. (2009) Christian Bird, Peter C Rigby, Earl T Barr, David J Hamilton, Daniel M German, and Prem Devanbu. 2009. The promises and perils of mining git. In Mining Software Repositories, 2009. MSR’09. 6th IEEE International Working Conference on. IEEE, 1–10.
  • Bissyandé et al. (2013) Tegawendé F Bissyandé, David Lo, Lingxiao Jiang, Laurent Réveillere, Jacques Klein, and Yves Le Traon. 2013. Got issues? who cares about it? a large scale investigation of issue trackers from github. In Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on. IEEE, 188–197.
  • Boehm (2006) Barry Boehm. 2006. A view of 20th and 21st century software engineering. In Proceedings of the 28th international conference on Software engineering. ACM, 12–29.
  • Brooks Jr (1975) Frederick P Brooks Jr. 1975. The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition, 1/E. Pearson Education India.
  • Cabot et al. (2015) Jordi Cabot, Javier Luis Cánovas Izquierdo, Valerio Cosentino, and Belén Rolandi. 2015. Exploring the use of labels to categorize issues in open-source software projects. In Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on. IEEE, 550–554.
  • Cosentino et al. (2017) Valerio Cosentino, Javier L Cánovas Izquierdo, and Jordi Cabot. 2017. A Systematic Mapping Study of Software Development With GitHub. IEEE Access 5 (2017), 7173–7192.
  • Cullom and Cullom (2006) Charmayne Cullom and Richard Cullom. 2006. Software Development: Cowboy or Samurai. Communications of the IIMA 6, 2 (2006), 1.
  • Dias et al. (2016) Luiz Felipe Dias, Igor Steinmacher, Gustavo Pinto, Daniel Alencar da Costa, and Marco Gerosa. 2016. How Does the Shift to GitHub Impact Project Collaboration?. In Software Maintenance and Evolution (ICSME), 2016 IEEE International Conference on. IEEE, 473–477.
  • Efron and Tibshirani (1994) Bradley Efron and Robert J Tibshirani. 1994. An introduction to the bootstrap. Chapman and Hall, London.
  • Gautam et al. (2017) Aakash Gautam, Saket Vishwasrao, and Francisco Servant. 2017. An empirical study of activity, popularity, size, testing, and stability in continuous integration. In Proceedings of the 14th International Conference on Mining Software Repositories. IEEE Press, 495–498.
  • Ghotra et al. (2015) Baljinder Ghotra, Shane McIntosh, and Ahmed E Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. In 37th ICSE-Volume 1. IEEE Press, 789–800.
  • Github (2017) Github. 2017. Github Showcases. https://github.com/showcases. (2017). [Online; accessed 13-October-2017].
  • Goeminne and Mens (2011) Mathieu Goeminne and Tom Mens. 2011. Evidence for the pareto principle in open source software activity. In the Joint Porceedings of the 1st International workshop on Model Driven Software Maintenance and 5th International Workshop on Software Quality and Maintainability. 74–82.
  • Gupta et al. (2014) Monika Gupta, Ashish Sureka, and Srinivas Padmanabhuni. 2014. Process mining multiple repositories for software defect resolution from control and organizational perspective. In Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 122–131.
  • Hislop et al. (2002) Gregory W Hislop, Michael J Lutz, J Fernando Naveda, W Michael McCracken, Nancy R Mead, and Laurie A Williams. 2002. Integrating agile practices into software engineering courses. Computer science education 12, 3 (2002), 169–185.
  • Jarczyk et al. (2014) Oskar Jarczyk, Błażej Gruszka, Szymon Jaroszewicz, Leszek Bukowski, and Adam Wierzbicki. 2014. Github projects. quality analysis of open-source software. In International Conference on Social Informatics. Springer, 80–94.
  • Kalliamvakou et al. (2014) Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2014. The promises and perils of mining github. In Proceedings of the 11th working conference on mining software repositories. ACM, 92–101.
  • Kocaguneli et al. (2013) E. Kocaguneli, T. Zimmermann, C. Bird, N. Nagappan, and T. Menzies. 2013. Distributed development considered harmful?. In 2013 35th International Conference on Software Engineering (ICSE). 882–890. https://doi.org/10.1109/ICSE.2013.6606637
  • Krishnamurthy (2002) Sandeep Krishnamurthy. 2002. Cave or community?: An empirical examination of 100 mature open source projects. (2002).
  • Martínez-Torres and Diaz-Fernandez (2014) M Rocío Martínez-Torres and María del Carmen Diaz-Fernandez. 2014. Current issues and research trends on open-source software communities. Technology Analysis & Strategic Management 26, 1 (2014), 55–68.
  • Mittas and Angelis (2013) Nikolaos Mittas and Lefteris Angelis. 2013.

    Ranking and clustering software cost estimation models through a multiple comparisons algorithm.

    IEEE Transactions on software engineering 39, 4 (2013), 537–551.
  • Mockus et al. (2002) Audris Mockus, Roy T Fielding, and James D Herbsleb. 2002. Two case studies of open source software development: Apache and Mozilla. ACM Transactions on Software Engineering and Methodology (TOSEM) 11, 3 (2002), 309–346.
  • Moniruzzaman and Hossain (2013) ABM Moniruzzaman and Dr Syed Akhter Hossain. 2013. Comparative study on agile software development methodologies. arXiv preprint arXiv:1307.3356 (2013).
  • Morcov (2012) Stefan Morcov. 2012. Complex IT Projects in Education: The Challenge. International Journal of Computer Science Research and Application 2 (2012), 115–125.
  • Munaiah et al. (2017) Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating GitHub for engineered software projects. Empirical Software Engineering (2017), 1–35. https://doi.org/10.1007/s10664-017-9512-6
  • Padhye et al. (2014) Rohan Padhye, Senthil Mani, and Vibha Singhal Sinha. 2014. A study of external community contribution to open-source projects on GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 332–335.
  • Peterson (2013) Kevin Peterson. 2013. The github open source development process. Technical Report. Technical report, Technical report, Mayo Clinic.
  • Pinto et al. (2016) Gustavo Pinto, Igor Steinmacher, and Marco Aurélio Gerosa. 2016. More common than you think: An in-depth study of casual contributors. In Software Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International Conference on, Vol. 1. IEEE, 112–123.
  • Rastogi et al. (2017) Ayushi Rastogi, Nachiappan Nagappan, and Pankaj Jalote. 2017. Empirical analyses of software contributor productivity. Ph.D. Dissertation. IIIT-Delhi.
  • Reyes López (2017) Arturo Reyes López. 2017. Analyzing GitHub as a Collaborative Software Development Platform: A Systematic Review. (2017).
  • Robles et al. (2009) Gregorio Robles, Jesus M Gonzalez-Barahona, and Israel Herraiz. 2009. Evolution of the core team of developers in libre software projects. In Mining Software Repositories, 2009. MSR’09. 6th IEEE International Working Conference on. IEEE, 167–170.
  • Torres et al. (2011) MR Martinez Torres, SL Toral, M Perales, and F Barrero. 2011. Analysis of the core team role in open source communities. In Complex, Intelligent and Software Intensive Systems (CISIS), 2011 International Conference on. IEEE, 109–114.
  • Tsay et al. (2014) Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Influence of social and technical factors for evaluating contribution in GitHub. In Proceedings of the 36th international conference on Software engineering. ACM, 356–366.
  • Wood-Harper and Wood (2005) Trevor Wood-Harper and Bob Wood. 2005. Multiview as social informatics in action: past, present and future. Information Technology & People 18, 1 (2005), 26–32.
  • Yamashita et al. (2015) Kazuhiro Yamashita, Shane McIntosh, Yasutaka Kamei, Ahmed E Hassan, and Naoyasu Ubayashi. 2015. Revisiting the applicability of the pareto principle to core development teams in open source software projects. In Proceedings of the 14th International Workshop on Principles of Software Evolution. ACM, 46–55.
  • Ye and Kishida (2003) Yunwen Ye and Kouichi Kishida. 2003. Toward an understanding of the motivation Open Source Software developers. In Proceedings of the 25th international conference on software engineering. IEEE Computer Society, 419–429.