The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development

05/22/2019

∙

As machine learning is applied more and more widely, data scientists often struggle to find or create end-to-end machine learning systems for specific tasks. The proliferation of libraries and frameworks and the complexity of the tasks have led to the emergence of "pipeline jungles" -- brittle, ad hoc ML systems. To address these problems, we introduce the Machine Learning Bazaar, a new approach to developing machine learning and AutoML software systems. First, we introduce ML primitives, a unified API and specification for data processing and ML components from different software libraries. Next, we compose primitives into usable ML programs, abstracting away glue code, data flow, and data storage. We further pair these programs with a hierarchy of search strategies -- Bayesian optimization and bandit learning. Finally, we create and describe a general-purpose, multi-task, end-to-end AutoML system that provides solutions to a variety of ML problem types (classification, regression, anomaly detection, graph matching, etc.) and data modalities (image, text, graph, tabular, relational, etc.). We both evaluate our approach on a curated collection of 431 real-world ML tasks and search millions of pipelines, and also demonstrate real-world use cases and case studies.

READ FULL TEXT

The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development

Sign in with Google

Consider DeepAI Pro