Which Pull Requests Get Accepted and Why? A study of popular NPM Packages

by   Tapajit Dey, et al.

Background: Pull Request (PR) Integrators often face challenges in terms of multiple concurrent PRs, so the ability to gauge which of the PRs will get accepted can help them balance their workload. PR creators would benefit from knowing if certain characteristics of their PRs may increase the chances of acceptance. Aim: We modeled the probability that a PR will be accepted within a month after creation using a Random Forest model utilizing 50 predictors representing properties of the author, PR, and the project to which PR is submitted. Method: 483,988 PRs from 4218 popular NPM packages were analysed and we selected a subset of 14 predictors sufficient for a tuned Random Forest model to reach high accuracy. Result: An AUC-ROC value of 0.95 was achieved predicting PR acceptance. The model excluding PR properties that change after submission gave an AUC-ROC value of 0.89. We tested the utility of our model in practical scenarios by training it with historical data for the NPM package bootstrap and predicting if the PRs submitted in future will be accepted. This gave us an AUC-ROC value of 0.94 with all 14 predictors, and 0.77 excluding PR properties that change after its creation. Conclusion: PR integrators can use our model for a highly accurate assessment of the quality of the open PRs and PR creators may benefit from the model by understanding which characteristics of their PRs may be undesirable from the integrators' perspective. The model can be implemented as a tool, which we plan to do as a future work.


Effect of Technical and Social Factors on Pull Request Quality for the NPM Ecosystem

Pull request (PR) based development, which is a norm for the social codi...

Does Code Quality Affect Pull Request Acceptance? An empirical study

Background. Pull requests are a common practice for contributing and rev...

Accepted or Abandoned? Predicting the Fate of Code Changes

Many mature Open-Source Software (OSS), as well as commercial, organizat...

A Machine Learning Approach to Determine the Semantic Versioning Type of npm Packages Releases

Semantic versioning policy is widely used to indicate the level of chang...

Quantum Büchi Automata

This paper defines a notion of quantum Büchi automaton (QBA for short) w...

Predicting Afrobeats Hit Songs Using Spotify Data

This study approached the Hit Song Science problem with the aim of predi...

Patterns of Effort Contribution and Demand and User Classification based on Participation Patterns in NPM Ecosystem

Background: Open source requires participation of volunteer and commerci...

Please sign up or login with your details

Forgot password? Click here to reset