Two-stage Optimization for Machine Learning Workflow

07/01/2019
by   Alexandre Quemy, et al.
0

Machines learning techniques plays a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging task, from both, the subject matter experts and the machine learning practitioners. For a broader adoption and scalability of machine learning systems, the construction and configuration of machine learning workflow need to gain in automation. In the last few years, several techniques have been developed in this direction, known as autoML. In this paper, we present a two-stage optimization process to build data pipelines and configure machine learning algorithms. First, we study the impact of data pipelines compared to algorithm configuration in order to show the importance of data preprocessing over hyperparameter tuning. The second part presents policies to efficiently allocate search time between data pipeline construction and algorithm configuration. Those policies are agnostic from the metaoptimizer. Last, we present a metric to determine if a data pipeline is specific or independent from the algorithm, enabling fine-grain pipeline pruning and meta-learning for the coldstart problem.

READ FULL TEXT

page 18

page 22

research
01/26/2021

Incremental Search Space Construction for Machine Learning Pipeline Synthesis

Automated machine learning (AutoML) aims for constructing machine learni...
research
05/10/2011

Self-configuration from a Machine-Learning Perspective

The goal of machine learning is to provide solutions which are trained b...
research
06/29/2023

AutoML in Heavily Constrained Applications

Optimizing a machine learning pipeline for a task at hand requires caref...
research
10/23/2018

Preprocessor Selection for Machine Learning Pipelines

Much of the work in metalearning has focused on classifier selection, co...
research
01/20/2023

Machine learning and reduced order modelling for the simulation of braided stent deployment

Endoluminal reconstruction using flow diverters represents a novel parad...
research
06/12/2020

dagger: A Python Framework for Reproducible Machine Learning Experiment Orchestration

Many research directions in machine learning, particularly in deep learn...
research
04/18/2022

AutoMLBench: A Comprehensive Experimental Evaluation of Automated Machine Learning Frameworks

Nowadays, machine learning is playing a crucial role in harnessing the p...

Please sign up or login with your details

Forgot password? Click here to reset