New developer metrics: Are comments as crucial as code contributions?

06/29/2020
by   Abdulkadir Seker, et al.
Yildiz Technical University
0

Open-source code development has become widespread in recent years. As a result, open-source software platforms have also become popular, and millions of developers from diverse locations are able to contribute to the same projects. On these platforms, various knowledge about them is obtained from user activity. This information is used in the form of developer metrics to solve a variety of challenges. In this study, we proposed new developer metrics, including commenting and issue-related activity, that require less information. We concluded that commenting on any feature of a project can be equally as valuable as code contribution. In addition, besides the quantitative ones, metrics based on only the existence of the activity have been shown to offer also considerable results. We saw that issues were crucial in identifying user contributions. Even if a developer makes a contribution to only one issue on a project, the relation between the developer and the project is tight. The hit scores are relatively lower because of the sparsity problem of our dataset; even so, we believe that we have presented improvable and remarkable new developer metrics.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/19/2021

An Exploratory Study of Project Activity Changepoints in Open Source Software Evolution

To explore the prevalence of abrupt changes (changepoints) in open sourc...
12/09/2020

From One to Hundreds: Multi-Licensing in the JavaScript Ecosystem

Open source licenses create a legal framework that plays a crucial role ...
04/26/2021

Leaving My Fingerprints: Motivations and Challenges of Contributing to OSS for Social Good

When inspiring software developers to contribute to open source software...
05/20/2020

Representation of Developer Expertise in Open Source Software

With tens of millions of projects and developers, the OSS ecosystem is b...
12/29/2021

Forking Around: Correlation of forking practices with the success of a project

Forking-based development has made it easier and straightforward for dev...
04/07/2021

Does the First-Response Matter for Future Contributions? A Study of First Contributions

Context: Open Source Software (OSS) projects rely on a continuous stream...
08/20/2018

Leveraging Historical Associations between Requirements and Source Code to Identify Impacted Classes

As new requirements are introduced and implemented in a software system,...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Thanks to the increasing capabilities of open source software development tools, the number of open source users and projects is growing each year. These platforms include millions of developers, each of whom has a different character and skill set, as well as a wide variety of projects that offer solutions to different problems. In environments with such a large amount of data, it is also difficult for developers to find similar products to their own, discern projects of interest, and reach projects to which they can contribute. As developers primarily use search engines or in-platform search menus to find projects, the constraints of text-based search McMillan2012 and challenges related to finding the correct keywords also cause some projects to be missed Hu2015 . While various project recommendation systems are being developed to overcome this problem, projects must be rated by users for recommendation models to work properly. In the same way that viewers give ratings to movies that they have watched, developers need to rate the projects in which they are interested. However, this is not currently the case on (open-source) software development platforms. Several software and developer metrics are used to calculate the score that a user gives projects, which are extracted from the activity or features of both developers and projects.

Developer metrics which are used in many challenges include the number of lines of code, developers’ degrees of connection to one another, past experience, or common features (nationality, location, occupation, gender, previously used programming languages, etc.). These metrics offer solutions to different problems within open source software development and distributed coding, including automatic assignments (task, issue, bug, or reviewer) DeLima2015 ; Badashian2015 ; Junior2018 , project recommendation systems Zhang2014 ; Sun2018 , software defect detection OzcanKini2018 , etc.

In this study, new developer metrics are presented to be used for different problems. We developed a project recommendation system for the evaluation of metrics and obtained remarkable results. The recommendation system was developed based on a dataset consisting of data from GitHub. Most GitHub users are familiar with (i.e., contribute to or interest) relatively few projects hosted on the platform. Because of this, a critical sparsity problem has occurred. To address the problem, we selected a dataset with a high users-projects ratio.

The structure of this paper is as follows. In the background section, we describe the literature on previously proposed metrics and project recommendation models for open-source software development platforms. In the research design section, the dataset used in this study is introduced and the proposed metrics are detailed. In the final section, the proposed metrics are assessed in terms of their accuracy rates in the project recommendation system.

2 Background

Pull request (PR) allows users to inform others about changes they have pushed to a branch in a repository on GitHub. PRs are a key feature in contributing code by different developers to a single project Gousios2014 . The proposed metrics related to this feature are used to solve different PR problems. PRs need to be reviewed by a reviewer to merge projects. If the result of a revision is positive, the PR is integrated into the master branch. Finding the correct reviewer is an important parameter for ensuring rapid and fair PR revisions. In this context, different metrics have been used to address the problem of automatic PR reviewer assignment. Existing literature has proposed various metrics to solve this problem, such as PR acceptance rate within a project, active developers on a project Junior2018 , PR file location Thongtanunam2015 , pull requesters’ social attributes Tsay2014 , and textual features of the PR Yu2014 , among others.

Closing a PR with an issue, PR age, and mentioning (@) a user in the PR comments have all been used to determine the priority of a PR VanDerVeen2015 . Cosentino proposed three developer metrics (community composition, acceptance rates and become a collaborator) to investigate project openness and stated that project owners could evaluate the attractiveness of their projects using these metrics Cosentino2014 .

Developer metrics are also used in the detection of software defects. In one study, defects were estimated using different metrics grouped by file and commit level. The number of files belonging to a commit, the most modified file of all files in a commit, the time between the first and last commits to a file, and the experience of a given developer on a committed file were identified as important metrics

OzcanKini2018 .

Reliability metrics are used to quantitatively express the reliability of a software product Kaur2014 . To measure reliability in open source projects, metrics such as number of contributors, number of commits, number of rows in commits, and certain metrics derived from them are used. Tiwari et al. proposed two important metrics for reliability: contributors and number of commits per 1000 code lines Tiwari2012 .

Code ownership metrics are also important for open source software. One study used the modified (touched) number of files to rank developer ownership according to code contributions Bird2011 . In another study, the number of changed lines in a file (churn) was used to address this problem Munson1998 . Foucault confirmed the relationship among these code ownership metrics and software quality Foucault2015 .

Recommender systems are an important research topic in software engineering Happel2008 ; Robillard2010 . In ordinary recommendation models, previously known user-item matrices are used; in other words, the rating given by a user for an item is known. In this state, the essential research topic is on estimating with different algorithms and models the rating that the user has already given Sharma2013 . However, the point in question is different on open source software platforms. Considering the developer as the user and the project (repository) as the item, the rating given by a developer to a project is unknown. In this context, the first problem that must be solved is how to create an accurate developer-project matrix. At this point, different developer metrics come into play.

GitHub is the world’s largest code server, hosting more than 40 million repositories to which over 100 million developers have contributed111https://en.wikipedia.org/wiki/GitHub. As such, GitHub is a reasonable choice for developing a project recommendation model. Sun et al. relied on basic user activity to develop a model using GitHub data. Specifically, when rating a project for a developer, they used like-star-create activities related to projects Sun2018 . We also used this scoring method to compare with our metrics in this paper. In another study, a developer’s social connections, programming with a common language, and contributions to the same projects or files were used as metrics Casalnuovo2015

. Liu designed a neural network–based recommendation system which used metrics such as working at the same company, previous collaboration with the project owner, and different time-related features of a project

Liu2018 . In a study aiming to predict whether a user would join a project in the future, the metrics used included a developer’s GitHub age (i.e., when their account was opened), the number of projects that they had joined, the programming languages of their commits, how many times a project was starred, the number of developers that joined a project, and the number of commits to a project Nielek2017 .

3 Research Design

3.1 Dataset

One of the most serious challenges in developing a recommender system is sparsity Niu2016 , a problem that occurs when most users rate only a few items Guo2012 . This issue is present on GitHub because it is not possible for developers to be aware of the millions of repositories on the platform. In the studies mentioned in the previous section, we observed that limited (less sparse) data were used, which is contrary to the nature of the GitHub environment. Although the results obtained in these studies appear successful, the question remains how successful the proposed algorithms will be on real platform data. In light of this, a sub-dataset reflective of the sparsity problem inherent to GitHub was used in this study. The dataset contained all data related to 100 developers and 41,280 projects Seker2020 . The creators of the dataset indicated that they selected the most active users on the platform. They then extracted all related data for these users from GitHub (commits, issues, pull requests, comments about these activities, watchers, etc.). Thus, we anticipated that the recommender system we developed would produce results parallel to those for the larger dataset of the platform as a whole. In this regard, although our evaluation results seem weak compared to similar studies, we believe that the proposed metrics are worthy of consideration.

3.2 Project Recommendation System

Designing a project recommender system for open source software development platforms includes two stages. First, the project-developer rating matrix is generated using specific metrics. Second, the top-k projects are recommended to each developer. Finally, the accuracy of the suggestions is evaluated. In this study, the recommendation model was designed as follows:

  1. Different developer metrics were used to obtain the score matrix. The values of the features (metrics) were scaled from 0 to 10. As in the movie-user model, each developer will thus have given a rating (0–10) to each project.

  2. The similarity between unknown projects222We assumed that “unknown” projects were those to which developers had no relationship and had made no contributions and rated projects was used to calculate the rating of unknown projects Sun2018

    . The similarity value between the projects was calculated using cosine similarity. When calculating the rating of an unknown one, the dot product of the similarity values between the projects that the user rated and the unknown project was used (Equation

    1). An example scenario involving this calculation is presented in Figure 1.

    (1)
  3. The top 5 highest-rated projects among the unknown projects were recommended to each developer.

  4. The accuracy of the recommendations was evaluated.

Figure 1: Calculating unrated project with the help of similarity rated projects.

3.3 Evaluation Techniques

While recommending projects to developers, there should be a ground truth for evaluating the proposed projects. Unlike ordinary recommender systems, there is an unsupervised model. The evaluation criteria for some studies related to this subject are set forth below.

  • A user-project-rating matrix was split randomly into test and training subsets. Accuracy or recall scores were then calculated from the intersection between the top-n scores of the test and training subsets Sun2018 . However, another study stated that this method should not be used on platforms like GitHub where time is an important parameter, pointing out that the problem of predicting past activity with future data will occur when using k-fold cross-validation by dividing the data randomly Junior2018 .

  • In another study, the recommended projects accuracy was evaluated using the developer’s past commits to the related project. A recommendation was assumed to be correct if the number of commits belonging to a certain developer on the project was over a certain value. The average number of commits per project was set as the threshold value in the dataset Liu2018 .

  • In a study predicting whether a developer would join a project in the future, the dataset was split into two different sets by time. In this way, the predicted result was verified with actual future data Nielek2017 .

In this study, GitHub’s “watching” feature was used as the ground truth. GitHub users can follow, or “watch,” projects whose developments they want to monitor Sheoran2014 . If a developer is watching a project, this indicates that he/she is interested in the project. Thus, “watching” can be considered as a real evaluation criterion. In our model, the top-n projects were recommended to each developer. If the recommended projects were among the developer’s watched projects, the projects were considered a hit333In other words, it is a correct recommendation. The case of a developer watching fewer than n projects was taken into account in the score equation 2.

(2)

The full name of a GitHub repository (project) is created by concatenating the owner’s username with the repository name. In analyzing our results, we noticed that the model recommended a project to a developer that only hit the owner’s name—that is, the model found an incorrect project by the correct owner. We evaluated this proposal has half the correct score, as recommending the correct owner to a developer will allow the developer to access the owner’s other projects.

An example scenario demonstrating this situation is given in Table 1. The projects recommended for Alice are listed in the first column. Since four of them are among the projects that Alice watches, the initial score is 4. In addition, there are two projects by a developer named ”fengmk2” among Alice’s watched projects (“fengmk2/parameter” and “fengmk2/cnpmjs.org”). For the fourth proposed project “fengmk2/emoji”, the owner’s name was guessed correctly, but the repository name was missed. In this case, Alice will be aware of other projects by ”fengmk2”. Thus a half-point is added to the initial score and 4.5 is the final score.

Top-5 Recommendation Full_name Matchs Owner matchs
iojs/io.js iojs/io.js
juliangruber/co-read juliangruber/co-read
koajs/compose koajs/compose
fengmk2/emoji fengmk2
visionmedia/co visionmedia/co
Table 1: A sample that is recommended correctly of the only the project owner

In this way, a project recommendation model has been created for open source platforms. The algorithm of the recommendation model is presented in Figure 2, starting with selecting a feature as a metric and end with calculating hit scores.

Figure 2: Project recommending model flowchart

4 Empirical Results

In this section, different developer metrics are given. These metrics provide information about a developer’s past activity on a project. All metrics used were scaled from 0 to 10 using the min-max normalization technique. The developer-project relationship was thus rated in the range 0–10 (as with a viewer’s rating of a movie). The results were calculated for the top 5 recommendation hit scores. We experimented with several metrics, including those based on coding language; obtained from the ratio of how many times a developer performed an activity to the total number of the activity; and created using different normalization methods. However, only metrics that obtained hit scores greater than 5% are shown in this study (except single metrics).

4.1 Single Metrics

Developer activity on projects was handled as a metric. Activity includes all kinds of comments, code contributions, revisions, and so on. In this section, all metrics were used individually in order to evaluate the significance of each. These metrics refer to the number of activities per project for a given developer (Table 2).

Metric Definition
1 issue_opened Number of issue opened
2 issue_commented Number of comments to issues
3 issue_closed Number of issue closed
4 issue_closedwithPR Number of issues closed with a PR
5 issue_assigned Number of issues assigned
6 commit_commented Number of comments to commits
7 commit_authored Number of authorship in commits
8 commit_committed Number of commits
9 pr_opened Number of PR opened
10 pr_merged Number of PR merged
11 pr_assigned Number of PR assigned
12 pr_commented Number of comments to PRs

Table 2: Single developer metrics

All single metrics were given to the model individually, and the scores presented in Table 3 were obtained according to the evaluation technique outlined above. In addition to our metrics, another metric that extracted from the study of Sun et al. was added to make a comparison. They scored developers and projects using like-star-create activities. They used text data extracted from projects’ README and source code files to find project similarities Sun2018 . There were approximately 22,000 repositories and 1,700 developers in their dataset, which was created using data from four groups of projects.

We planned to use this less sparsed dataset to make a fair comparison but could not because the dataset was unshared. Owing to we could not communicate with them, we applied their rating algorithms to our dataset.

Metric Hit Score (%)
issue_commented 15.3
issue_closedwithPR 15
issue_opened 14
pr_opened 13.7
commit_commented 11.9
pr_commented 10.9
pr_merged 9.3
Sun’s metric 7.7
commit_authored 6
commit_committed 5.7
issue_closed 3
issue_assigned 2.8
pr_assigned 2.5
Table 3: Single developer metrics scores

When the results are analyzed, it is clear that the issue-related metrics are crucial even by themselves. Closing an issue with a PR means that the PR fixed a bug or issue in the project VanDerVeen2015 . As our results indicate, issue_closedwithPR is a remarkable metric. Opening an issue or PR is also an important metric. The most interesting conclusion that can be drawn from these results is that comments have higher hit scores than direct code contributions.

4.2 Fusion Metrics

In these results, we observed that some metric groups came to the forefront. New metrics can be proposed by grouping comments, code contributions, or other common featured metrics. In this context, fusion metrics were created from single metrics.

  1. count: is created from the sum of all metrics.

  2. contribution: is created from the sum of all code contribution-related metrics.

  3. comment: is created from the sum of all comment-related metrics.

  4. issue_related: is created from the sum of all issue-related metrics.

  5. pr_related is created from the sum of all PR-related metrics.

  6. commit_related: is created from the sum of all commit-related metrics.

  7. commit2comment is created from the (commit_committed divided by commit_commented)

  8. issue2comment is created from the (issue_opened divided by issue_commented)

  9. pr2comment is created from the (pr_opened divided by pr_commented)

  10. code2comment is created from the ratio of two fusion metrics (contribution divided by comment)

The results of the fusion metrics are presented in Table 4.a. Most fusion metrics had a positive impact on hit score. comment in particular was a remarkable metric which showed that commenting is as important as code contributions. In addition, the issue_related also drew our attention. The results of the ratio-based metrics were not as successful as the others.

(a)
Fusion Metric Hit Score (%)
comment 17
issue_related 16
contribution 15.6
count 15.2
pr_related 14.9
issue2comment 14.2
code2comment 13
commit_related 11
pr2comment 11
Sun’s metric 7.7
commit2comment 6.9

& (b) Binary Fusion Metric Hit Score (%) issue_related 20 count 19 contribution 18.8 comment 18.4 pr_related 15.5 commit_related 11.6 Sun’s metric 7.7 issue2comment x pr2comment x commit2comment x code2comment x

Table 4: Fusion developer metrics scores

4.3 Binary Metrics

We tried different metrics for the project recommendation in the previous section. It is interesting that the metric consisting only of comments achieved a higher success rate than the metric created by collecting all single metrics. We continued to study different metrics to analyze whether success could be increased with less information.

In the above single metrics, there is information about how many activities were made (For instance, John opened 18 issues in the projectX, the rating of John-projectX is 18). Alternatively, a set of metrics was created simply showing whether that activity existed (For instance, even so, John opened an issue on the projectX, the rating of John-projectX is 1 or there is no opened issue by John in the projectY, the rating of John-projectY is 0). In this context, the binary metrics were created using the equation 3.

(3)

4.3.1 Binary Fusion Metrics

Binary metrics and ratio-based fusion metrics consist only of 0s and 1s. As such, using them directly will not generate logical results. Therefore, binary fusion metrics were generated from these metrics, and only the results of these binary fusion metrics are given (Table 4.b).

The results improved, and the top success metrics rankings swapped places. The best five metrics were the same as in the previous section. These results show that the number of comments is crucial in using comments as a metric. So, the presence of comments alone is not sufficient to use it properly. On the other hand, the issue_related metric is leading among the binary metrics, indicating that issues are the most important feature for the developer-project relationship. Apart from our findings regarding the importance of comments, obtaining better results from issues than commits was one of the most surprising results of this study.

5 Conclusion

The main purpose of this study was to propose new developer metrics that could be used to solve different software engineering challenges. In this study, the features extracted from user activity on an open source platform (GitHub) were used. The study focused on finding metrics that would enable greater success with less knowledge. Some fusion metrics gave successfull results.

It is clearly seen that the comment made gave impressive results. For this metric, quantity is an important parameter. It means the more a developer writes comments, the more related to the project. On the contrary, it can deduce from the results of binary fusion metrics that the issue_related is a quantity-free metric. It also means, to use this metric is adequate to know it’s presence is whether or not. On this regard, It is revealing that issue is a significant feature for open-source platforms. In addition, metrics that used features based on existence (binary metrics) were highly successful, showing that for some activities, there is no need for quantities in order to extract knowledge.

We presented these new developer metrics, but we are curious why some of them became prominent. In light of this, we are planning another study involving a survey for junior and senior developers whom we can contact to understand the ground truth of our metrics’ success (especially the comment metric).

Because of the sparsity problem, our hit scores may not higher enough when comparing the other similar studies. Even so, we think that we offered some new improvable developer metrics. Moreover, to compare our results, we added a very similar metric used in Sun et al.’s study Sun2018 , which revealed that most of our metrics surpassed that metric. In this context, we plan to apply the obtained metrics to different datasets for validity.

In addition, we anticipate that these metrics will be useful for solving various problems. Many developers besides owners and collaborators have contributed to projects due to the open source nature of GitHub. On some projects, external developers even made more contributions than the core team. These metrics can reveal developers’ contribution rankings on a project. To implement this, we plan to cooperate with a software company.

References

  • (1) Badashian, A.S., Hindle, A., Stroulia, E.: Crowdsourced bug triaging. In: 2015 IEEE 31st International Conference on Software Maintenance and Evolution, ICSME 2015 - Proceedings, pp. 506–510. Institute of Electrical and Electronics Engineers Inc. (2015). DOI 10.1109/ICSM.2015.7332503
  • (2) Bird, C., Nagappan, N., Murphy, B., Gall, H., Devanbu, P.: Don’t touch my code! Examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT Symposium on Foundations of Software Engineering, pp. 4–14. ACM Press, New York, New York, USA (2011). DOI 10.1145/2025113.2025119. URL http://dl.acm.org/citation.cfm?doid=2025113.2025119
  • (3) Casalnuovo, C., Vasilescu, B., Devanbu, P., Filkov, V.: Developer On boarding in GitHub: The role of prior social links and language experience. In: 2015 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2015 - Proceedings, pp. 817–828. Association for Computing Machinery, Inc, New York, New York, USA (2015). DOI 10.1145/2786805.2786854. URL http://dl.acm.org/citation.cfm?doid=2786805.2786854
  • (4) Cosentino, V., Izquierdo, J.L.C., Cabot, J.: Three Metrics to Explore the Openness of GitHub projects. arXiv preprint arXiv:1409.4253 (2014). URL http://arxiv.org/abs/1409.4253
  • (5) De Lima, M.L., Soares, D.M., Plastino, A., Murta, L.: Developers assignment for analyzing pull requests. In: Proceedings of the ACM Symposium on Applied Computing, vol. 13-17-Apri, pp. 1567–1572. Association for Computing Machinery, New York, New York, USA (2015). DOI 10.1145/2695664.2695884. URL http://dl.acm.org/citation.cfm?doid=2695664.2695884
  • (6) Foucault, M., Teyton, C., Lo, D., Blanc, X., Falleri, J.R.: On the usefulness of ownership metrics in open-source software projects. In: Information and Software Technology, vol. 64, pp. 102–112. Elsevier (2015). DOI 10.1016/j.infsof.2015.01.013
  • (7) Gousios, G., Pinzger, M., Deursen, A.V.: An exploratory study of the pull-based software development model. In: Proceedings - International Conference on Software Engineering, 1, pp. 345–355. IEEE Computer Society, New York, New York, USA (2014). DOI 10.1145/2568225.2568260. URL http://dl.acm.org/citation.cfm?doid=2568225.2568260
  • (8) Guo, G.: Resolving data sparsity and cold start in recommender systems. In: Proceedings of the 20th international conference on User Modeling, Adaptation, and Personalization, vol. 7379 LNCS, pp. 361–364 (2012). DOI 10.1007/978-3-642-31454-4˙36
  • (9) Happel, H.J., Maalej, W.: Potentials and challenges of recommendation systems for software development. In: Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 11–15. ACM Press, New York, New York, USA (2008). DOI 10.1145/1454247.1454251. URL http://portal.acm.org/citation.cfm?doid=1454247.1454251
  • (10) Hu, J., Sun, X., Lo, D., Li, B.: Modeling the evolution of development topics using Dynamic Topic Models. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015 - Proceedings, pp. 3–12. Institute of Electrical and Electronics Engineers Inc. (2015). DOI 10.1109/SANER.2015.7081810
  • (11) Júnior, M.L.d.L., Soares, D.M., Plastino, A., Murta, L.: Automatic assignment of integrators to pull requests: The importance of selecting appropriate attributes. Journal of Systems and Software 144, 181–196 (2018). DOI 10.1016/j.jss.2018.05.065
  • (12) Kaur, G., Bahl, K.: Software Reliability, Metrics, Reliability Improvement Using Agile Process. International Journal of Innovative Science, Engineering & Technology 1(3), 143–147 (2014). URL www.ijiset.com
  • (13) Liu, C., Yang, D., Zhang, X., Ray, B., Rahman, M.M.: Recommending GitHub Projects for Developer Onboarding. IEEE Access 6, 52082–52094 (2018). DOI 10.1109/ACCESS.2018.2869207
  • (14) McMillan, C., Grechanik, M., Poshyvanyk, D.: Detecting similar software applications. In: Proceedings - International Conference on Software Engineering, pp. 364–374 (2012). DOI 10.1109/ICSE.2012.6227178
  • (15) Munson, J.C., Elbaum, S.G.: Code churn: a measure for estimating the impact of code change. In: International Conference on Software Maintenance, pp. 24–31. IEEE (1998). DOI 10.1109/icsm.1998.738486
  • (16) Nielek, R., Jarczyk, O., Pawlak, K., Bukowski, L., Bartusiak, R., Wierzbicki, A.: Choose a Job You Love: Predicting Choices of GitHub Developers. In: IEEE/WIC/ACM International Conference on Web Intelligence, WI 2016, pp. 200–207. Institute of Electrical and Electronics Engineers Inc. (2017). DOI 10.1109/WI.2016.0037
  • (17) Niu, J., Wang, L., Liu, X., Yu, S.: FUIR: Fusing user and item information to deal with data sparsity by using side information in recommendation systems. Journal of Network and Computer Applications 70, 41–50 (2016). DOI 10.1016/j.jnca.2016.05.006
  • (18) Ozcan Kini, S., Tosun, A.: Periodic developer metrics in software defect prediction. In: Proceedings - 18th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2018, pp. 72–81. Institute of Electrical and Electronics Engineers Inc. (2018). DOI 10.1109/SCAM.2018.00016
  • (19) Robillard, M., Walker, R., Zimmermann, T.: Recommendation systems for software engineering. IEEE Software 27(4), 80–86 (2010). DOI 10.1109/MS.2009.161
  • (20) Sharma, L., Gera, A.: A Survey of Recommendation System: Research Challenges. International Journal of Engineering Trends and Technology 4(5), 1989–1992 (2013)
  • (21) Sheoran, J., Blincoe, K., Kalliamvakou, E., Damian, D., Ell, J.: Understanding ”watchers” on GitHub. In: 11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings, pp. 336–339. Association for Computing Machinery, Inc, New York, New York, USA (2014). DOI 10.1145/2597073.2597114. URL http://dl.acm.org/citation.cfm?doid=2597073.2597114
  • (22) Sun, X., Xu, W., Xia, X., Chen, X., Li, B.: Personalized project recommendation on GitHub. Science China Information Sciences 61(5), 1–14 (2018). DOI 10.1007/s11432-017-9419-x
  • (23) Thongtanunam, P., Tantithamthavorn, C., Kula, R.G., Yoshida, N., Iida, H., Matsumoto, K.I.: Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015 - Proceedings, pp. 141–150. Institute of Electrical and Electronics Engineers Inc. (2015). DOI 10.1109/SANER.2015.7081824
  • (24) Tiwari, V., Pandey, R.: Open source software and reliability metrics. In: International Journal of Advanced Research in Computer and Communication Engineering, pp. 808–815 (2012). URL https://pdfs.semanticscholar.org/ce3f/2129c3c735ce4e0d4ac048ea36f990903022.pdf
  • (25) Tsay, J., Dabbish, L., Herbsleb, J.: Influence of social and technical factors for evaluating contribution in GitHub. In: Proceedings - International Conference on Software Engineering, 1, pp. 356–366. IEEE Computer Society, New York, New York, USA (2014). DOI 10.1145/2568225.2568315. URL http://dl.acm.org/citation.cfm?doid=2568225.2568315
  • (26) Van Der Veen, E., Gousios, G., Zaidman, A.: Automatically prioritizing pull requests. In: IEEE International Working Conference on Mining Software Repositories, vol. 2015-Augus, pp. 357–361. IEEE Computer Society (2015). DOI 10.1109/MSR.2015.40
  • (27) Yu, Y., Wang, H., Yin, G., Ling, C.X.: Reviewer recommender of pull-requests in GitHub. In: Proceedings - 30th International Conference on Software Maintenance and Evolution, ICSME 2014, pp. 609–612. Institute of Electrical and Electronics Engineers Inc. (2014). DOI 10.1109/ICSME.2014.107
  • (28) Zhang, L., Zou, Y., Xie, B., Zhu, Z.: Recommending relevant projects via user behaviour: An exploratory study on Github. In: 1st International Workshop on Crowd-Based Software Development Methods and Technologies, CrowdSoft 2014 - Proceedings, pp. 25–30. Association for Computing Machinery, Inc, New York, New York, USA (2014). DOI 10.1145/2666539.2666570. URL http://dl.acm.org/citation.cfm?doid=2666539.2666570
  • (29) Şeker, A., Diri, B., Arslan, H.: Summarising Big Data: Common GitHub Dataset for Software Engineering Challenges (2020). URL http://arxiv.org/abs/2006.04967