Scanflow: A multi-graph framework for Machine Learning workflow management, supervision, and debugging

11/04/2021
by   Gusseppe Bravo-Rocca, et al.
13

Machine Learning (ML) is more than just training models, the whole workflow must be considered. Once deployed, a ML model needs to be watched and constantly supervised and debugged to guarantee its validity and robustness in unexpected situations. Debugging in ML aims to identify (and address) the model weaknesses in not trivial contexts. Several techniques have been proposed to identify different types of model weaknesses, such as bias in classification, model decay, adversarial attacks, etc., yet there is not a generic framework that allows them to work in a collaborative, modular, portable, iterative way and, more importantly, flexible enough to allow both human- and machine-driven techniques. In this paper, we propose a novel containerized directed graph framework to support and accelerate end-to-end ML workflow management, supervision, and debugging. The framework allows defining and deploying ML workflows in containers, tracking their metadata, checking their behavior in production, and improving the models by using both learned and human-provided knowledge. We demonstrate these capabilities by integrating in the framework two hybrid systems to detect data drift distribution which identify the samples that are far from the latent space of the original distribution, ask for human intervention, and whether retrain the model or wrap it with a filter to remove the noise of corrupted data at inference time. We test these systems on MNIST-C, CIFAR-10-C, and FashionMNIST-C datasets, obtaining promising accuracy results with the help of human involvement.

READ FULL TEXT

page 10

page 12

page 16

page 17

page 18

page 19

page 20

page 21

research
07/29/2019

sql4ml A declarative end-to-end workflow for machine learning

We present sql4ml, a system for expressing supervised machine learning (...
research
06/24/2020

Subpopulation Data Poisoning Attacks

Machine learning (ML) systems are deployed in critical settings, but the...
research
11/08/2017

LatentPoison - Adversarial Attacks On The Latent Space

Robustness and security of machine learning (ML) systems are intertwined...
research
05/04/2022

SMLT: A Serverless Framework for Scalable and Adaptive Machine Learning Design and Training

In today's production machine learning (ML) systems, models are continuo...
research
03/11/2023

DEPLOYR: A technical framework for deploying custom real-time machine learning models into the electronic medical record

Machine learning (ML) applications in healthcare are extensively researc...
research
08/10/2022

Machine Learning with DBOS

We recently proposed a new cluster operating system stack, DBOS, centere...

Please sign up or login with your details

Forgot password? Click here to reset