VeML: An End-to-End Machine Learning Lifecycle for Large-scale and High-dimensional Data

by   Van-Duc Le, et al.

An end-to-end machine learning (ML) lifecycle consists of many iterative processes, from data preparation and ML model design to model training and then deploying the trained model for inference. When building an end-to-end lifecycle for an ML problem, many ML pipelines must be designed and executed that produce a huge number of lifecycle versions. Therefore, this paper introduces VeML, a Version management system dedicated to end-to-end ML Lifecycle. Our system tackles several crucial problems that other systems have not solved. First, we address the high cost of building an ML lifecycle, especially for large-scale and high-dimensional dataset. We solve this problem by proposing to transfer the lifecycle of similar datasets managed in our system to the new training data. We design an algorithm based on the core set to compute similarity for large-scale, high-dimensional data efficiently. Another critical issue is the model accuracy degradation by the difference between training data and testing data during the ML lifetime, which leads to lifecycle rebuild. Our system helps to detect this mismatch without getting labeled data from testing data and rebuild the ML lifecycle for a new data version. To demonstrate our contributions, we conduct experiments on real-world, large-scale datasets of driving images and spatiotemporal sensor data and show promising results.


page 1

page 4

page 5

page 6

page 8

page 11

page 13

page 15


Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data

Machine learning has been proven to be effective in various application ...

On the Choice of Data for Efficient Training and Validation of End-to-End Driving Models

The emergence of data-driven machine learning (ML) has facilitated signi...

Datasheets for Machine Learning Sensors

Machine learning (ML) sensors offer a new paradigm for sensing that enab...

Translation Between Waves, wave2wave

The understanding of sensor data has been greatly improved by advanced d...

On the Robustness of Deep Learning-predicted Contention Models for Network Calculus

The network calculus (NC) analysis takes a simple model consisting of a ...

Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction

In this work we give a case study of an embodied machine-learning (ML) p...

Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning

To break the bottlenecks of mainstream cloud-based machine learning (ML)...

Please sign up or login with your details

Forgot password? Click here to reset