An Empirical Analysis of the Python Package Index (PyPI)

07/25/2019
by   Ethan Bommarito, et al.
0

In this research, we provide a comprehensive empirical summary of the Python Package Repository, PyPI, including both package metadata and source code covering 178,592 packages, 1,745,744 releases, 76,997 contributors, and 156,816,750 import statements. We provide counts and trends for packages, releases, dependencies, category classifications, licenses, and package imports, as well as authors, maintainers, and organizations. As one of the largest and oldest software repositories as of publication, PyPI provides insight not just into the Python ecosystem today, but also trends in software development and licensing more broadly over time. Within PyPI, we find that the growth of the repository has been robust under all measures, with a compound annual growth rate of 18.3% for active packages, 15.5% for new authors, and 27% for new import statements over the last 15 years. As with many similar social systems, we find a number of highly right-skewed distributions, including the distribution of releases per package, packages and releases per author, imports per package, and size per package and release. However, we also find that most packages are contributed by single individuals, not multiple individuals or organizations. The data, methods, and calculations herein provide an anchor for public discourse on PyPI and serve as a foundation for future research on the Python software ecosystem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2021

An Empirical Analysis of the R Package Ecosystem

In this research, we present a comprehensive, longitudinal empirical sum...
research
08/24/2022

On the Dependency Heaviness of CRAN/Bioconductor Ecosystem

The R package ecosystem is expanding fast and dependencies among package...
research
11/04/2021

A set of R packages to estimate population counts from mobile phone data

In this paper, we describe the software implementation of the methodolog...
research
05/24/2023

Using the Uniqueness of Global Identifiers to Determine the Provenance of Python Software Source Code

We consider the problem of identifying the provenance of free/open sourc...
research
04/11/2023

pymovements: A Python Package for Eye Movement Data Processing

We introduce pymovements: a Python package for analyzing eye-tracking da...
research
10/31/2018

An Empirical Analysis of Vulnerabilities in Python Packages for Web Applications

This paper examines software vulnerabilities in common Python packages u...
research
06/02/2021

semopy 2: A Structural Equation Modeling Package with Random Effects in Python

Structural Equation Modeling (SEM) is an umbrella term that includes num...

Please sign up or login with your details

Forgot password? Click here to reset