DeepAI AI Chat
Log In Sign Up

Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering

10/09/2019
by   Renan Souza, et al.
0

Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists' expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stackholders how it was created. The main limitation of provenance tracking solutions is that they cannot cope with provenance capture and integration of domain and ML data processed in the multiple workflows in the lifecycle while keeping the provenance capture overhead low. To handle this problem, in this paper we contribute with a detailed characterization of provenance data in the ML lifecycle in CSE; a new provenance data representation, called PROV-ML, built on top of W3C PROV and ML Schema; and extensions to a system that tracks provenance from multiple workflows to address the characteristics of ML and CSE, and to allow for provenance queries with a standard vocabulary. We show a practical use in a real case in the Oil and Gas industry, along with its evaluation using 48 GPUs in parallel.

READ FULL TEXT

page 1

page 3

page 5

page 9

09/30/2020

Workflow Provenance in the Lifecycle of Scientific Machine Learning

Machine Learning (ML) has already fundamentally changed several business...
03/10/2020

Managing Data Lineage of O G Machine Learning Models: The Sweet Spot for Shale Use Case

Machine Learning (ML) has increased its role, becoming essential in seve...
07/15/2022

Modeling Quality and Machine Learning Pipelines through Extended Feature Models

The recently increased complexity of Machine Learning (ML) methods, led ...
08/27/2019

A Framework for Model Search Across Multiple Machine Learning Implementations

Several recently devised machine learning (ML) algorithms have shown imp...
11/15/2022

The Lean Data Scientist: Recent Advances towards Overcoming the Data Bottleneck

Machine learning (ML) is revolutionizing the world, affecting almost eve...
11/11/2022

Capabilities for Better ML Engineering

In spite of machine learning's rapid growth, its engineering support is ...