An Empirical Analysis of the R Package Ecosystem

02/19/2021
by   Ethan Bommarito, et al.
0

In this research, we present a comprehensive, longitudinal empirical summary of the R package ecosystem, including not just CRAN, but also Bioconductor and GitHub. We analyze more than 25,000 packages, 150,000 releases, and 15 million files across two decades, providing comprehensive counts and trends for common metrics across packages, releases, authors, licenses, and other important metadata. We find that the historical growth of the ecosystem has been robust under all measures, with a compound annual growth rate of 29 packages, 28 similar social systems, we find a number of highly right-skewed distributions with practical implications, including the distribution of releases per package, packages and releases per author or maintainer, package and maintainer dependency in-degree, and size per package and release. For example, the top five packages are imported by nearly 25 maintainers support packages that are imported by over half of all packages. We also highlight the dynamic nature of the ecosystem, recording both dramatic acceleration and notable deceleration in the growth of R. From a licensing perspective, we find a notable majority of packages are distributed under copyleft licensing or omit licensing information entirely. The data, methods, and calculations herein provide an anchor for public discourse and industry decisions related to R and CRAN, serving as a foundation for future research on the R software ecosystem and "data science" more broadly.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

07/25/2019

An Empirical Analysis of the Python Package Index (PyPI)

In this research, we provide a comprehensive empirical summary of the Py...
03/14/2021

Binary R Packages for Linux: Past, Present and Future

Pre-compiled binary packages provide a very convenient way of efficientl...
01/27/2022

An Empirical Study of Yanked Releases in the Rust Package Registry

Cargo, the software packaging manager of Rust, provides a yank mechanism...
04/09/2022

What are the characteristics of highly-selected packages? A case study on the npm ecosystem

With the popularity of software ecosystems, the number of open source co...
02/11/2021

I Know What You Imported Last Summer: A study of security threats in thePython ecosystem

The popularity of Python has risen rapidly over the past 15 years. It is...
08/03/2018

DataDeps.jl: Repeatable Data Setup for Replicable Data Science

We present DataDeps.jl: a julia package for the reproducible handling of...
09/02/2020

covid19.analytics: An R Package to Obtain, Analyze and Visualize Data from the Corona Virus Disease Pandemic

With the emergence of a new pandemic worldwide, a novel strategy to appr...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.