Characterizing Deep Learning Package Supply Chains in PyPI: Domains, Clusters, and Disengagement

06/28/2023
by   Kai Gao, et al.
0

Deep learning (DL) package supply chains (SCs) are critical for DL frameworks to remain competitive. However, vital knowledge on the nature of DL package SCs is still lacking. In this paper, we explore the domains, clusters, and disengagement of packages in two representative PyPI DL package SCs to bridge this knowledge gap. We analyze the metadata of nearly six million PyPI package distributions and construct version-sensitive SCs for two popular DL frameworks: TensorFlow and PyTorch. We find that popular packages (measured by the number of monthly downloads) in the two SCs cover 34 domains belonging to eight categories. Applications, Infrastructure, and Sciences categories account for over 85 have developed specializations on Infrastructure and Applications packages respectively. We employ the Leiden community detection algorithm and detect 131 and 100 clusters in the two SCs. The clusters mainly exhibit four shapes: Arrow, Star, Tree, and Forest with increasing dependency complexity. Most clusters are Arrow or Star, but Tree and Forest clusters account for most packages (Tensorflow SC: 70 reasons why packages disengage from the SC (i.e., remove the DL framework and its dependents from their installation dependencies): dependency issues, functional improvements, and ease of installation. The most common disengagement reason in the two SCs are different. Our study provides rich implications on the maintenance and dependency management practices of PyPI DL SCs.

READ FULL TEXT
research
07/21/2022

Demystifying Dependency Bugs in Deep Learning Stack

Recent breakthroughs in deep learning (DL) techniques have stimulated si...
research
08/24/2022

On the Dependency Heaviness of CRAN/Bioconductor Ecosystem

The R package ecosystem is expanding fast and dependencies among package...
research
10/13/2017

An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems

Nearly every popular programming language comes with one or more package...
research
11/26/2018

Refactoring Software Packages via Community Detection from Stability Point of View

As the complexity and size of software projects increases in real-world ...
research
01/23/2021

Präzi: From Package-based to Call-based Dependency Networks

Software reuse has emerged as one of the most crucial elements of modern...
research
02/10/2023

A Mathematical Model of Package Management Systems – from General Event Structures to Antimatroids

This paper brings mathematical tools to bear on the study of package dep...
research
08/25/2023

Knowledge-Based Version Incompatibility Detection for Deep Learning

Version incompatibility issues are rampant when reusing or reproducing d...

Please sign up or login with your details

Forgot password? Click here to reset