The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development

05/22/2019
by   Micah J. Smith, et al.
16

As machine learning is applied more and more widely, data scientists often struggle to find or create end-to-end machine learning systems for specific tasks. The proliferation of libraries and frameworks and the complexity of the tasks have led to the emergence of "pipeline jungles" -- brittle, ad hoc ML systems. To address these problems, we introduce the Machine Learning Bazaar, a new approach to developing machine learning and AutoML software systems. First, we introduce ML primitives, a unified API and specification for data processing and ML components from different software libraries. Next, we compose primitives into usable ML programs, abstracting away glue code, data flow, and data storage. We further pair these programs with a hierarchy of search strategies -- Bayesian optimization and bandit learning. Finally, we create and describe a general-purpose, multi-task, end-to-end AutoML system that provides solutions to a variety of ML problem types (classification, regression, anomaly detection, graph matching, etc.) and data modalities (image, text, graph, tabular, relational, etc.). We both evaluate our approach on a curated collection of 431 real-world ML tasks and search millions of pipelines, and also demonstrate real-world use cases and case studies.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 7

page 9

page 11

research
08/11/2018

MARVIN: An Open Machine Learning Corpus and Environment for Automated Machine Learning Primitive Annotation and Execution

In this demo paper, we introduce the DARPA D3M program for automatic mac...
research
05/30/2022

Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning

To break the bottlenecks of mainstream cloud-based machine learning (ML)...
research
08/31/2021

Towards Observability for Machine Learning Pipelines

Software organizations are increasingly incorporating machine learning (...
research
05/01/2020

PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML Pipelines

In recent years, a wide variety of automated machine learning (AutoML) m...
research
01/29/2018

Search Based Code Generation for Machine Learning Programs

Machine Learning (ML) has revamped every domain of life as it provides p...
research
06/28/2017

autoBagging: Learning to Rank Bagging Workflows with Metalearning

Machine Learning (ML) has been successfully applied to a wide range of d...
research
02/21/2022

ICSML: Industrial Control Systems Machine Learning inference framework natively executing on IEC 61131-3 languages

Industrial Control Systems (ICS) have played a catalytic role in enablin...

Please sign up or login with your details

Forgot password? Click here to reset