Bugs in Machine Learning-based Systems: A Faultload Benchmark

06/24/2022
by   Mohammad Mehdi Morovati, et al.
0

The rapid escalation of applying Machine Learning (ML) in various domains has led to paying more attention to the quality of ML components. There is then a growth of techniques and tools aiming at improving the quality of ML components and integrating them into the ML-based system safely. Although most of these tools use bugs' lifecycle, there is no standard benchmark of bugs to assess their performance, compare them and discuss their advantages and weaknesses. In this study, we firstly investigate the reproducibility and verifiability of the bugs in ML-based systems and show the most important factors in each one. Then, we explore the challenges of generating a benchmark of bugs in ML-based software systems and provide a bug benchmark namely defect4ML that satisfies all criteria of standard benchmark, i.e. relevance, reproducibility, fairness, verifiability, and usability. This faultload benchmark contains 113 bugs reported by ML developers on GitHub and Stack Overflow, using two of the most popular ML frameworks: TensorFlow and Keras. defect4ML also addresses important challenges in Software Reliability Engineering of ML-based software systems, like: 1) fast changes in frameworks, by providing various bugs for different versions of frameworks, 2) code portability, by delivering similar bugs in different ML frameworks, 3) bug reproducibility, by providing fully reproducible bugs with complete information about required dependencies and data, and 4) lack of detailed information on bugs, by presenting links to the bugs' origins. defect4ML can be of interest to ML-based systems practitioners and researchers to assess their testing tools and techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2021

Why are Some Bugs Non-Reproducible? An Empirical Investigation using Data Fusion

Software developers attempt to reproduce software bugs to understand the...
research
09/09/2021

The challenge of reproducible ML: an empirical study on the impact of bugs

Reproducibility is a crucial requirement in scientific research. When re...
research
04/27/2022

Prescriptive and Descriptive Approaches to Machine-Learning Transparency

Specialized documentation techniques have been developed to communicate ...
research
09/05/2019

TFCheck : A TensorFlow Library for Detecting Training Issues in Neural Network Programs

The increasing inclusion of Machine Learning (ML) models in safety criti...
research
08/16/2018

Identifying Implementation Bugs in Machine Learning based Image Classifiers using Metamorphic Testing

We have recently witnessed tremendous success of Machine Learning (ML) i...
research
05/13/2020

Understanding the Nature of System-Related Issues in Machine Learning Frameworks: An Exploratory Study

Modern systems are built using development frameworks. These frameworks ...
research
06/10/2023

An Empirical Study of Bugs in Quantum Machine Learning Frameworks

Quantum computing has emerged as a promising domain for the machine lear...

Please sign up or login with your details

Forgot password? Click here to reset