Empirical Study on the Software Engineering Practices in Open Source ML Package Repositories

12/02/2020
by   Minke Xiu, et al.
0

Recent advances in Artificial Intelligence (AI), especially in Machine Learning (ML), have introduced various practical applications (e.g., virtual personal assistants and autonomous cars) that enhance the experience of everyday users. However, modern ML technologies like Deep Learning require considerable technical expertise and resources to develop, train and deploy such models, making effective reuse of the ML models a necessity. Such discovery and reuse by practitioners and researchers are being addressed by public ML package repositories, which bundle up pre-trained models into packages for publication. Since such repositories are a recent phenomenon, there is no empirical data on their current state and challenges. Hence, this paper conducts an exploratory study that analyzes the structure and contents of two popular ML package repositories, TFHub and PyTorch Hub, comparing their information elements (features and policies), package organization, package manager functionalities and usage contexts against popular software package repositories (npm, PyPI, and CRAN). Through these studies, we have identified unique SE practices and challenges for sharing ML packages. These findings and implications would be useful for data scientists, researchers and software developers who intend to use these shared ML packages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2019

An Exploratory Study on Machine Learning Model Stores

Recent advances in Artificial Intelligence, especially in Machine Learni...
research
05/20/2020

Supervised learning with artificial hydrocarbon networks: an open source implementation and its applications

Artificial hydrocarbon networks (AHN) is a novel supervised learning met...
research
01/11/2022

Automatic Detection and Analysis of Technical Debts in Peer-Review Documentation of R Packages

Technical debt (TD) is a metaphor for code-related problems that arise a...
research
03/05/2023

An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry

Deep Neural Networks (DNNs) are being adopted as components in software ...
research
01/14/2019

On the Diversity of Software Package Popularity Metrics: An Empirical Study of npm

Software systems often leverage on open source software libraries to reu...
research
07/27/2023

On the Suitability of Hugging Face Hub for Empirical Studies

Background. The development of empirical studies in software engineering...
research
08/21/2021

A Survey on Common Threats in npm and PyPi Registries

Software engineers regularly use JavaScript and Python for both front-en...

Please sign up or login with your details

Forgot password? Click here to reset