Helix: Holistic Optimization for Accelerating Iterative Machine Learning

12/14/2018
by   Doris Xin, et al.
0

Machine learning workflow development is a process of trial-and-error: developers iterate on workflows by testing out small modifications until the desired accuracy is achieved. Unfortunately, existing machine learning systems focus narrowly on model training---a small fraction of the overall development time---and neglect to address iterative development. We propose Helix, a machine learning system that optimizes the execution across iterations---intelligently caching and reusing, or recomputing intermediates as appropriate. Helix captures a wide variety of application needs within its Scala DSL, with succinct syntax defining unified processes for data preprocessing, model specification, and learning. We demonstrate that the reuse problem can be cast as a Max-Flow problem, while the caching problem is NP-Hard. We develop effective lightweight heuristics for the latter. Empirical evaluation shows that Helix is not only able to handle a wide variety of use cases in one unified workflow but also much faster, providing run time reductions of up to 19x over state-of-the-art systems, such as DeepDive or KeystoneML, on four real-world applications in natural language processing, computer vision, social and natural sciences.

READ FULL TEXT
research
08/03/2018

Helix: Accelerating Human-in-the-loop Machine Learning

Data application developers and data scientists spend an inordinate amou...
research
03/27/2018

How Developers Iterate on Machine Learning Workflows -- A Survey of the Applied Machine Learning Literature

Machine learning workflow development is anecdotally regarded to be an i...
research
12/10/2019

Managing Machine Learning Workflow Components

Machine Learning Workflows (MLWfs) have become essential and a disruptiv...
research
12/01/2021

SaDe: Learning Models that Provably Satisfy Domain Constraints

With increasing real world applications of machine learning, models are ...
research
07/26/2017

TensorLayer: A Versatile Library for Efficient Deep Learning Development

Deep learning has enabled major advances in the fields of computer visio...
research
12/05/2022

Continual learning on deployment pipelines for Machine Learning Systems

Following the development of digitization, a growing number of large Ori...
research
01/12/2022

Rache: Radix-additive caching for homomorphic encryption

One of the biggest concerns for many applications in cloud computing lie...

Please sign up or login with your details

Forgot password? Click here to reset