An Empirical Analysis of the R Package Ecosystem

02/19/2021
by   Ethan Bommarito, et al.
0

In this research, we present a comprehensive, longitudinal empirical summary of the R package ecosystem, including not just CRAN, but also Bioconductor and GitHub. We analyze more than 25,000 packages, 150,000 releases, and 15 million files across two decades, providing comprehensive counts and trends for common metrics across packages, releases, authors, licenses, and other important metadata. We find that the historical growth of the ecosystem has been robust under all measures, with a compound annual growth rate of 29 packages, 28 similar social systems, we find a number of highly right-skewed distributions with practical implications, including the distribution of releases per package, packages and releases per author or maintainer, package and maintainer dependency in-degree, and size per package and release. For example, the top five packages are imported by nearly 25 maintainers support packages that are imported by over half of all packages. We also highlight the dynamic nature of the ecosystem, recording both dramatic acceleration and notable deceleration in the growth of R. From a licensing perspective, we find a notable majority of packages are distributed under copyleft licensing or omit licensing information entirely. The data, methods, and calculations herein provide an anchor for public discourse and industry decisions related to R and CRAN, serving as a foundation for future research on the R software ecosystem and "data science" more broadly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2019

An Empirical Analysis of the Python Package Index (PyPI)

In this research, we provide a comprehensive empirical summary of the Py...
research
03/14/2021

Binary R Packages for Linux: Past, Present and Future

Pre-compiled binary packages provide a very convenient way of efficientl...
research
08/24/2022

On the Dependency Heaviness of CRAN/Bioconductor Ecosystem

The R package ecosystem is expanding fast and dependencies among package...
research
01/27/2022

An Empirical Study of Yanked Releases in the Rust Package Registry

Cargo, the software packaging manager of Rust, provides a yank mechanism...
research
09/02/2020

covid19.analytics: An R Package to Obtain, Analyze and Visualize Data from the Corona Virus Disease Pandemic

With the emergence of a new pandemic worldwide, a novel strategy to appr...
research
05/27/2023

CRAN Task Views: The Next Generation

CRAN Task Views have been available on the Comprehensive R Archive Netwo...

Please sign up or login with your details

Forgot password? Click here to reset