Comparative analysis of real bugs in open-source Machine Learning projects – A Registered Report

09/20/2022
by   Tuan Dung Lai, et al.
0

Background: Machine Learning (ML) systems rely on data to make predictions, the systems have many added components compared to traditional software systems such as the data processing pipeline, serving pipeline, and model training. Existing research on software maintenance has studied the issue-reporting needs and resolution process for different types of issues, such as performance and security issues. However, ML systems have specific classes of faults, and reporting ML issues requires domain-specific information. Because of the different characteristics between ML and traditional Software Engineering systems, we do not know to what extent the reporting needs are different, and to what extent these differences impact the issue resolution process. Objective: Our objective is to investigate whether there is a discrepancy in the distribution of resolution time between ML and non-ML issues and whether certain categories of ML issues require a longer time to resolve based on real issue reports in open-source applied ML projects. We further investigate the size of fix of ML issues and non-ML issues. Method: We extract issues reports, pull requests and code files in recent active applied ML projects from Github, and use an automatic approach to filter ML and non-ML issues. We manually label the issues using a known taxonomy of deep learning bugs. We measure the resolution time and size of fix of ML and non-ML issues on a controlled sample and compare the distributions for each category of issue.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2023

NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python

Machine learning (ML) has gained much attention and been incorporated in...
research
03/21/2022

Towards a Change Taxonomy for Machine Learning Systems

Machine Learning (ML) research publications commonly provide open-source...
research
12/20/2021

How Do Developers Deal with Security Issue Reports on GitHub?

Security issue reports are the primary means of informing development te...
research
10/13/2021

AI Total: Analyzing Security ML Models with Imperfect Data in Production

Development of new machine learning models is typically done on manually...
research
03/21/2022

Non-Functional Requirements for Machine Learning: An Exploration of System Scope and Interest

Systems that rely on Machine Learning (ML systems) have differing demand...
research
04/06/2023

Tag that issue: Applying API-domain labels in issue tracking systems

Labeling issues with the skills required to complete them can help contr...
research
05/13/2020

Understanding the Nature of System-Related Issues in Machine Learning Frameworks: An Exploratory Study

Modern systems are built using development frameworks. These frameworks ...

Please sign up or login with your details

Forgot password? Click here to reset