On Misbehaviour and Fault Tolerance in Machine Learning Systems

09/16/2021
by   Lalli Myllyaho, et al.
0

Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new situations and contexts. At the same time, this adaptability raises uncertainties concerning the run-time product quality or dependability, such as reliability and security, of these systems. Systems can be tested and monitored, but this does not provide protection against faults and failures in adapted ML systems themselves. We studied software designs that aim at introducing fault tolerance in ML systems so that possible problems in ML components of the systems can be avoided. The research was conducted as a case study, and its data was collected through five semi-structured interviews with experienced software architects. We present a conceptualisation of the misbehaviour of ML systems, the perceived role of fault tolerance, and the designs used. Common patterns to incorporating ML components in design in a fault tolerant fashion have started to emerge. ML models are, for example, guarded by monitoring the inputs and their distribution, and enforcing business rules on acceptable outputs. Multiple, specialised ML models are used to adapt to the variations and changes in the surrounding world, and simpler fall-over techniques like default outputs are put in place to have systems up and running in the face of problems. However, the general role of these patterns is not widely acknowledged. This is mainly due to the relative immaturity of using ML as part of a complete software system: the field still lacks established frameworks and practices beyond training to implement, operate, and maintain the software that utilises ML. ML software engineering needs further analysis and development on all fronts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2019

Studying Software Engineering Patterns for Designing Machine Learning Systems

Machine-learning (ML) techniques have become popular in the recent years...
research
03/29/2022

Achieving Guidance in Applied Machine Learning through Software Engineering Techniques

Development of machine learning (ML) applications is hard. Producing suc...
research
01/10/2023

Understanding the Complexity and Its Impact on Testing in ML-Enabled Systems

Machine learning (ML) enabled systems are emerging with recent breakthro...
research
10/17/2018

Fault Tolerance in Iterative-Convergent Machine Learning

Machine learning (ML) training algorithms often possess an inherent self...
research
03/05/2021

Quartermaster: A Tool for Modeling and Simulating System Degradation

It is essential that software systems be tolerant to degradations in com...
research
08/07/2020

Towards Using Probabilistic Models to Design Software Systems with Inherent Uncertainty

The adoption of machine learning (ML) components in software systems rai...
research
06/30/2021

Using AntiPatterns to avoid MLOps Mistakes

We describe lessons learned from developing and deploying machine learni...

Please sign up or login with your details

Forgot password? Click here to reset