A Machine Learning Approach to Determine the Semantic Versioning Type of npm Packages Releases

by   Rabe Abdalkareem, et al.

Semantic versioning policy is widely used to indicate the level of changes in a package release. Unfortunately, there are many cases where developers do not respect the semantic versioning policy, leading to the breakage of dependent applications. To reduce such cases, we proposed using machine learning (ML) techniques to effectively predict the new release type, i.e., patch, minor, major, in order to properly determine the semantic versioning type. To perform our prediction, we mined and used a number of features about a release, such as the complexity of the changed code, change types, and development activities. We then used four ML classifiers. To evaluate the performance of the proposed ML classifiers, we conducted an empirical study on 31 JavaScript packages containing a total of approximately 6,260 releases. We started by extracting 41 release level features from historical data of packages' source code and repositories. Then, we used four machine learning classifiers, namely XGBoost, Random Forest, Decision Tree, and Logistic Regression. We found that the XGBoost classifiers performed the best, achieving median ROC AUC values of 0.78, 0.69, and 0.74 for major, minor, and patch releases, respectively. We also found that features related to the change types in a release are the best predictors group of features in determining the semantic versioning type. Finally, we studied the generalizability of determining the semantic versioning type by applying cross-package validation. Our results showed that the general classifier achieved median ROC AUC values of 0.76, 0.69, and 0.75 for major, minor, and patch releases.


page 1

page 2

page 3

page 4


On the evolution of technical lag in the npm package dependency network

Software packages developed and distributed through package managers ext...

An Empirical Study of Yanked Releases in the Rust Package Registry

Cargo, the software packaging manager of Rust, provides a yank mechanism...

I depended on you and you broke me: An empirical study of manifesting breaking changes in client packages

Complex software systems have a network of dependencies. Developers ofte...

An Acoustical Machine Learning Approach to Determine Abrasive Belt Wear of Wide Belt Sanders

This paper describes a machine learning approach to determine the abrasi...

Which Pull Requests Get Accepted and Why? A study of popular NPM Packages

Background: Pull Request (PR) Integrators often face challenges in terms...

Towards a Prediction of Machine Learning Training Time to Support Continuous Learning Systems Development

The problem of predicting the training time of machine learning (ML) mod...

DCoM: A Deep Column Mapper for Semantic Data Type Detection

Detection of semantic data types is a very crucial task in data science ...

Please sign up or login with your details

Forgot password? Click here to reset